Solved

SQL - Grouping and missing data Query help

Posted on 2007-11-26
18
248 Views
Last Modified: 2011-05-19
I need some help formatting an SQL query (MS SQL server) I am good at basic queries but this one I think will be over my head.

There is a table called SWQuality that has Location, Datetime, NH3 and Discharge  Location is a Nvarchar, Datime is a Date/Time data type, NH3 is nvarchr and Discharge is Numeric.  It has other fields but these are the ones we need to use in this query. There is also a Date field (Float Datatype) and a Time field with a Float Data type.  They are like this because of an Excel import tool that has to have them in this format. The Datetime does not have a time it is just a date...  don't ask me why..

So a row of data would look like:

LOCATION             DATETIME           DISCHARGE          DATE             TIME             NH3
Bolton                      11/16/2003           754                    20031116          400            0.058
Bolton                      11/16/2003           722                    20031116          1200          0.012
Bolton                      11/16/2003           740                    20031116           2000        -0.006
Bolton                      11/17/2003           760                    20031117           400           0.026

There are 1,000's of records in the database what I need is a way to query the records to get a daily average of the Discharge value for each day with the daily average on the NH3 value to display like

DateTime         Discharge       NH3
11/16/2003       730                .022
11/17/2003       740                .029
11/18/2003       755                .034

I can get close to this with this query:

Select MIN(Discharge) as MIN, Datetime from SWQuality where location like 'HUFFMAN' GROUP BY DateTime HAVING Datetime BETWEEN '10/14/2003' AND '3/26/2006' Order by DateTime

Not sure if MIN ir right but my problem is that I need to be able to display Missing data where I can plug in a variable for what constitutes MISSING data.  For example a missing X number of days between the dates.  Then I need it to display like below where the missing day is still there but there is no value:

DateTime         Discharge       NH3
11/16/2003       730                .022
11/17/2003                      
11/18/2003       755                .034

Can this be done?  I am not opposed to a stored procedure although I would prefer just a regular SQL statement as I am using ASP and the data results to populate a dynamic chart.

Can anyone help?

THANKS!






0
Comment
Question by:upgraders
  • 7
  • 5
  • 3
  • +1
18 Comments
 
LVL 15

Expert Comment

by:dbbishop
ID: 20354560
First step, check out using a calendar table at http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html
An outer join from it to your table with a date range within the calendar table will provide the missing dates.

You can build a table on the fly, but that will add more overhead to your query. I would suggest a stored procedure (ALWAYS) for any queries, especially over the internet (reduces SQL Injection). To build a table onm the fly, you can do something like this (assuming you have the start and end dates:

DECLARE @StartDate SMALDATETIME
DECLARE @EndDate SMALLDATETIME
DECLARE @CurrDate SMALLDATETIME
SET @StartDate = '2003-10-14'
SET @EndDate = '2006-03-26'

CREATE TABLE Calendar (theDate SMALLDATETIME)

SET @CurrDate = @StartDate
WHILE @CurrDate <= @EndDate
BEGIN
    INSERT INTO Calendar (theDate) SELECT @CurrDate
    SET @CurrDate = DATEADD(dd, 1, @CurrDate
END

then
RIGHT JOIN Calendar ON SWQuality.Datetime = Calendar.theDate
0
 
LVL 32

Expert Comment

by:bhess1
ID: 20354612
The simplest way to approach this type of query is to create a table of dates to cover.

First, because this class of solutions is much more common than you may realize, let's create a table containing the numbers from 0 to 4095 inclusive.


CREATE TABLE Utility_Numbers (Num int)

INSERT INTO Utility_Numbers Values (0)
INSERT INTO Utility_Numbers Values (1)

INSERT INTO Utility_Numbers SELECT Num + 2 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 4 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 8 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 16 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 32 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 64 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 128 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 256 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 512 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 1024 FROM Utility_Numbers

INSERT INTO Utility_Numbers SELECT Num + 2048 FROM Utility_Numbers

-----------------------------------------------------------------

Now, to run your process, you need to provide two data points:  The start and end date range (this can be written to use a the number of days as well).  We will use these values in a stored procedure to provide the data you need:

CREATE PROCEDURE AverageDischargeData
    @BeginDate datetime,
    @EndDate datetime,
    @Location nvarchar(50)
AS
    SELECT Dates.Datetime,
        AvgDischarge,
        AvgNH3
    FROM (
        SELECT Dateadd(Day, Num, @BeginDate) As [DateTime]
        FROM Utility_Numbers un
        WHERE Num <= Datediff(Day, @BeginDate, @EndDate)
        ) As Dates
    LEFT JOIN (
        SELECT DateTime,
            Avg(Discharge) as AvgDischarge,
            Avg(NH3) As AvgNH3
        FROM SWQuality
        WHERE Location = @Location
            And DateTime >= @BeginDate
            AND DateTime <= @EndDate
        ) As Avgs
    ON Dates.Datetime = Avgs.Datetime
Order By Dates.Datetime
0
 
LVL 1

Author Comment

by:upgraders
ID: 20354621
Thanks for the quick reply..  when it comes to stored proeedures I am stupid I can pretty much copy and paste code but that is my extent.  What about the problem with the date and time fields?  there is one field with the date only and the other two are the date and time but are formated differently.
0
 
LVL 25

Accepted Solution

by:
imitchie earned 500 total points
ID: 20354866
first of all, create this function in the database

create function dbo.MakeDates( @startDate datetime, @endDate datetime )
returns @tbl table (thedate datetime) as
begin
 set @startDate = cast(floor(cast(@startDate as float)) as datetime)
 while @startDate <= @endDate
 begin
  insert into @tbl values(@startDate)
  select @startDate = @startDate + 1
 end
 return
end
GO

then, your query should look like this: (take care to use the start and end dates both in the MakeDates function and in the WHERE clause)

select DateTime, Avg(Discharge), Avg(NH3)
from dbo.MakeDates( '10/14/2003', '3/26/2006' )
 left join SWQuality on 1=1
where location like 'HUFFMAN'
GROUP BY DateTime
HAVING Datetime BETWEEN '10/14/2003' AND '3/26/2006'
Order by DateTime
0
 
LVL 32

Expert Comment

by:bhess1
ID: 20354888
Don't see that it matters much.  Unless you need the precise date and time together, ignore it.
0
 
LVL 1

Author Comment

by:upgraders
ID: 20355310
Thanks,  I created the user define function and tested the SQL query and it said

Incorrect Syntax near HAVING
0
 
LVL 32

Expert Comment

by:bhess1
ID: 20355331
You shouldn't need the HAVING statement.

Using imitchie's proc as the starting point, try this:

select DateTime, Avg(Discharge), Avg(NH3)
from dbo.MakeDates( '10/14/2003', '3/26/2006' ) dt
 left join SWQuality on SWQuality.Datetime = dt.Datetime
where location like 'HUFFMAN'
    AND dt.Datetime BETWEEN '10/14/2003' AND '3/26/2006'
Order by DateTime
0
 
LVL 32

Expert Comment

by:bhess1
ID: 20355336
Whoops!

select DateTime, Avg(Discharge), Avg(NH3)
from dbo.MakeDates( '10/14/2003', '3/26/2006' ) dt
 left join SWQuality on SWQuality.Datetime = dt.Datetime
where location like 'HUFFMAN'
    AND dt.Datetime BETWEEN '10/14/2003' AND '3/26/2006'
GROUP BY dt.Datetime
Order by dt.DateTime
0
 
LVL 1

Author Comment

by:upgraders
ID: 20355345
Humm  that gives me the error

Invalid Column Name Datetime
0
Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

 
LVL 25

Expert Comment

by:imitchie
ID: 20355353
I thought you said your table has these columns?
LOCATION             DATETIME           DISCHARGE          DATE             TIME             NH3

try this
select [DateTime], Avg(Discharge), Avg(NH3)
from dbo.MakeDates( '10/14/2003', '3/26/2006' )
 left join SWQuality on 1=1
where location like 'HUFFMAN'
   and [Datetime] BETWEEN '10/14/2003' AND '3/26/2006'
GROUP BY [DateTime]
Order by [DateTime]
0
 
LVL 1

Author Comment

by:upgraders
ID: 20355391
That worked.. but the results don't create the missing data  for example here is a snipet of the results aside from from a crazy decimal (maybe we could round to the second place after the decimal) the data jumps from 11/29/2003 to 3/23/2006

      11/28/2003      4125      0.090000000000002
      11/29/2003      3818      7.39999999999994E-02
      3/23/2006      680      0.090000000000002
      3/24/2006      667      4.99999999999995E-02

What would be fine would be fine or better is the day after the last value the next date with no value and the day before the next record with data to be blank that way I don't have 100's blank records. I think the chart component can take that, but if not I would need the missing data with the next consecutive date.

0
 
LVL 32

Expert Comment

by:bhess1
ID: 20355437
That's why my original suggestion uses a different technique.  To get the missing days, you must filter the data in the SWQuality table by location prior to  joining it to the derived dates table.

Here is a version of my stored procedure that uses the date table function that imitchie created:
CREATE PROCEDURE AverageDischargeData
    @BeginDate datetime,
    @EndDate datetime,
    @Location nvarchar(50)
AS
    SELECT Dates.Datetime,
        AvgDischarge,
        AvgNH3
    FROM dbo.MakeDates( @BeginDate, @EndDate ) As Dates
    LEFT JOIN (
        SELECT DateTime,
            Avg(Discharge) as AvgDischarge,
            Avg(NH3) As AvgNH3
        FROM SWQuality
        WHERE Location LIKE @Location
            And DateTime >= @BeginDate
            AND DateTime <= @EndDate
        ) As Avgs
    ON Dates.Datetime = Avgs.Datetime
Order By Dates.Datetime


To use it, type this in Query Analyzer:

EXEC AverageDischargeData 10/14/2003', '3/26/2006', 'HUFFMAN'
0
 
LVL 25

Assisted Solution

by:imitchie
imitchie earned 500 total points
ID: 20355459
bhess1,
while you're obviously a smart kid, that looks a tad complicated
select theDate as [DateTime], Avg(Discharge), Avg(NH3)

from dbo.MakeDates( '10/14/2003', '3/26/2006' )

 left join SWQuality on 

   location like 'HUFFMAN' and [DateTime] = theDate

GROUP BY theDate

Order by theDate

Open in new window

0
 
LVL 1

Author Comment

by:upgraders
ID: 20355473
Thanks for your help.. it is late here 10:40 PM  EST I will try it out tomorrow morning..  when will you guy check in again?
0
 
LVL 1

Author Comment

by:upgraders
ID: 20357788
That worked great..  thanks Guys..  I will have a few more advanced queries I'll need help with,  hopefully you will be around!  
0
 
LVL 15

Expert Comment

by:dbbishop
ID: 20360049
I would still recommend you build a permanent calendar table, realizing that every time you run this (or similar) queries, the function has to build the table variable within the function. There can be quite a bit of overhead involved there, especially when dealing with a query that involves several years, as your sample does. A table variable is usually rtecommended when you have only a small number of rows (under 100), then the recommendation is to go with a temp table, which usually requires even more overhead to build than a table variable and cannot be returned by a function.
Bu8ilding a permanent calander table as described in the link I provided, will alleviate many of those problems. If you don't care about weekends, holidays, etc, just do not include those columns and the code to populate them.
0
 
LVL 1

Author Comment

by:upgraders
ID: 20360073
Thanks...  I would not know how to even start.  If you want I can open a new question and let you help, if I am allowed to do that.
0
 
LVL 15

Expert Comment

by:dbbishop
ID: 20361291
See the article I mentioned in my first post. If all you need is a contiguous date range, just exeute the first two steps, CREATE TABLE and INSERT INTO

In the CRETAE TABLE statement, just use the following:
CREATE TABLE dbo.Calendar  
(  
    theDate SMALLDATETIME NOT NULL
        PRIMARY KEY CLUSTERED)

The code:
SET NOCOUNT ON
DECLARE @dt SMALLDATETIME
SET @dt = '20000101'
WHILE @dt < '20300101'
BEGIN
    INSERT dbo.Calendar(theDate) SELECT @dt
    SET @dt = @dt + 1
END


loads it with all dates between January 1, 2000 and December 31, 2029.

If you will be querying dates prior to 1/1/2000, change the initial value of:
SET @dt = '20000101'

change imitchie's code to:

select theDate as [DateTime], Avg(Discharge), Avg(NH3)
from Calander
 left join SWQuality on
   location like 'HUFFMAN' and [DateTime] = theDate
GROUP BY theDate
Order by theDate

0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now