Suggestions for writing a query

Hello Experts,
Need your suggestion please as my knowledge with SQL is basic. I have a table having several thousands of rows. Each set of rows have a common id for example, the first 1000 rows belong to dataset no. 1 and so all have value 1 in the id column. Within these 1000 rows, I need to find the calculation of multiple two rows having same time stamp and compare with another row having similar time stamp.

For example, pick the value contained in the row having field X and pick the value contained in the row having field y, divide them and compare it with the value in the row z by display it as output similar to the columns below. I am not sure if Joins would be the right way as they are all part of the same table.


id | X Value | Y Value | Division Result | Z Value | Time Stamp
1         20                 10                        2                     2              23:59:59
sukhoi35Asked:
Who is Participating?
 
PortletPaulfreelancerCommented:
This result:
| GroupNo | XValue | YValue |       ResultValue | ZValue |         TimeInterval | Organization |
|---------|--------|--------|-------------------|--------|----------------------|--------------|
|       2 |  20.25 |     40 | 1.975308641975308 |   1.97 | 2017-10-01T08:00:00Z |         Org1 |
|       2 |     15 |     20 | 1.333333333333333 |   1.33 | 2017-10-01T18:00:00Z |         Org2 |
|       2 |     10 |     10 |                 1 |      1 | 2017-10-01T08:00:00Z |         Org3 |

Open in new window

From this query:
SELECT
      GroupNo
    , XValue
    , YValue
    , Yvalue / NULLIF(XValue,0) ResultValue
    , ZValue
    , TimeInterval
    , Organization
FROM (
      SELECT
            GroupNo
          , Organization
          , TimeInterval
          , first_value(nValues) over(partition by GroupNo, Organization, TimeInterval
                                      order by (select 1)) XValue
          , lead(nValues,1) over(partition by GroupNo, Organization, TimeInterval
                                 order by (select 1)) YValue
          , lead(nValues,2) over(partition by GroupNo, Organization, TimeInterval
                                 order by (select 1)) ZValue
          , row_number() over(partition by GroupNo, Organization, TimeInterval
                              order by (select 1)) rn
      FROM that_table
    ) d
WHERE rn = 1
;

Open in new window

see  http://sqlfiddle.com/#!6/c0e43/2
CREATE TABLE that_table
    ([GroupNo] int, [Code] varchar(2), [Organization] varchar(4), [TimeInterval] datetime, [nValues] decimal(12,2), [Flag] int)
;
    
INSERT INTO that_table
    ([GroupNo], [Code], [Organization], [TimeInterval], [nValues], [Flag])
VALUES
    (2, 'X1', 'Org1', '2017-10-01 08:00:00', 20.25, 0),
    (2, 'Y1', 'Org1', '2017-10-01 08:00:00', 40, 0),
    (2, 'Z1', 'Org1', '2017-10-01 08:00:00', 1.97, 0),
    (2, 'X1', 'Org2', '2017-10-01 18:00:00', 15, 0),
    (2, 'Y1', 'Org2', '2017-10-01 18:00:00', 20, 0),
    (2, 'Y1', 'Org2', '2017-10-01 18:00:00', 1.33, 0),
    (2, 'X1', 'Org3', '2017-10-01 08:00:00', 10, 0),
    (2, 'Y1', 'Org3', '2017-10-01 08:00:00', 10, 0),
    (2, 'Z1', 'Org3', '2017-10-01 08:00:00', 1, 0)
;

Open in new window

0
 
Olaf DoschkeSoftware DeveloperCommented:
You showed wanted output, thanks.

But what's the input here, can you clarify terms as "multiple two rows" and rows having field x,y,z? As a table has a set of fields, every row has the fields x,y and z. Or are you saying you have a field "name" or "key" with values x,y, or z and another field "value" with the x,y, or z values?

From what you describe so far it's only clear the id of your table is not an id in the meaning of identifier as primary key, it rather is a foreign key id of 1000rows all pointing to the same parent.Such fields should never be merely named "id", they should specify which parent they paint to and establish a relation to the parent table via foreign key constraint.

You say your SQL knowledge is basic. That shows here, no worries, that could be fixed as a side task, but let's concentrate on the situation as is, so could you please simply write out three records as the source of the values x,y and z? Thank you.

Bye, Olaf.
0
 
PortletPaulfreelancerCommented:
A copy/paste of some rows (just a few) that demonstrate what the source data looks like will greatly assist
I regret that I don't understand by the  words alone.

Oh! and you will need to define what "similar time stamp" actually means.  (+/- 1 minute? +/- 15 minutes? less? more?)

Oh! (again) will also need to know what the actual "data type" of that column "time stamp". is it datetime? time? (hopefully not varchar!)
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
sukhoi35Author Commented:
Thank you experts for such a quick response, and apologies for not being clear here. I have tried to simulate data (as I can't paste the real data).


In the attached image, I have a table having rows grouped by a number. Here, I will pick y value of a particular time interval and divide it by the x value of the  matching time interval and compare it with the z value of the matching time interval. Z is calculated by the calculation server. If you see, the z values do not match which helps me in identifying easily that there is a problem. Please note each group number can contain several organizations and each organization can have only one value of x, y and z i.e. only one set of x1, y1 and z1, one set of x2, y2 and z2.....so on. The date is of type 'datetime'

Please let me know if further details are required.Sample data
0
 
PortletPaulfreelancerCommented:
looks a lot like this:

SELECT
      [Group No]
    , Organization
    , [Time Interval]
    , MIN (VALUES) [X Value]
    , MAX (VALUES) [Y Value]
    , MAX (VALUES) / MIN (VALUES) [Z Value]
FROM that_table
GROUP BY
      [Group No]
    , Organization
    , [Time Interval]

Open in new window

0
 
PortletPaulfreelancerCommented:
sorry missed the [Z Value] was a column

SELECT
      [Group No]
    , Organization
    , [Time Interval]
    , MIN (VALUES) [X Value]
    , MAX (VALUES) [Y Value]
    , MAX (VALUES) / MIN (VALUES)  [Result (Y/X)]
    , [Z Value]
FROM that_table
GROUP BY
      [Group No]
    , Organization
    , [Time Interval]
    , [Z Value]

Open in new window

0
 
sukhoi35Author Commented:
Hi PortletPaul,
Thanks for your response. Please note the values taken are just samples, the numbers can just be anything. I think, min and max would not be the right choice. X can be 13.5 and y can be 2, it is of type decimal. Is there any other way to identify the x value and y value please?
0
 
sukhoi35Author Commented:
We need not consider all the time intervals. If we just consider the first time interval for each organization, should be sufficient for display in the output table.
0
 
PortletPaulfreelancerCommented:
I responded to what I see in that image. If you don't reveal details it is just a little impossible for us to guess what we don't see.

>>"Is there any other way to identify the x value and y value please?"
is "x" the "first value" and is y the "last" value
The real point here is you have to tell me what the logic is, all can do is guess

IF you provide a sample of data  in a reusable format then perhaps we cold write a query. (nb:"data" is not an "image of data")

A simple copy/paste of the "input" section of your worksheet into a "code block" would help.

But note the the sample needs to be "representative", that is, you need to choose your sample carefully. But, the sample still needs to be relatively small.  Don't include anything that might be private or sensitive.

If the expected output you have shown so far doesn't cover all the items of the sample then please also provide the new expected output.
0
 
PortletPaulfreelancerCommented:
so for the "time interval" of 08:05 in your image, and the FIRST value = X, and the SECOND value = Y  ? (please onfirm)
and we need only consider first and second for each "time interval"? (please confirm)
0
 
sukhoi35Author Commented:
Sorry Paul, because I am working on a remote machine from which I can't copy data, I have recreated it. I have tried to arrange for a sample excel file with some random meaningful numbers with column names morphed.In my above statement, I meant something like this:

- Org1 has x1, y1, z1 data for  time intervals say starting at 8.00, 8.05, then 8.15, 8.20..... (for each x1, y1 and z1)
- Org2 has x1, y1, z1 data for similar time intervals like above
- Org3 has x1, y1, z1 data for similar time intervals like above
...and so on.

If I can just get the above mentioned output for just the top row when grouped as time interval for each of the Organization it would help me. I do have many other calculations that I will have to perform, but having this simple division as the base, I will modify it accordingly and do further operations. i.e. the output contains contains only the first row which will be the first time interval for each of the organizations. Also, the variable names x1, y1 and z1 can be considered always the same for each of the orgs. (please ignore x2, y2 and z2, etc in my previous sample).
sample-data.xlsx
0
 
Olaf DoschkeSoftware DeveloperCommented:
Is there really a column containing the literal textual values "X1","Y1","Z1"?
You named both column B and E as "values", whereas one categorizes it as x,y, or z value while the other contains the values.
could you simply create your sample data 1:1 with names of the real table columns?
And if there is no column stating which value is x, which value is y and which value is z, how is that to be determined? By first/second/third record within the same timestamp? Ordered in which way, then?

So far I'd fetch out a list
SELECT "Group No.", "Organization", MIN("Time Interval") as Timestamp FROM yourtable GROUP BY "Group No.", "Organization"

Open in new window

Then in step 2 matching records having these Group No., Organization and Timestamp will limit it to X1, Y1, and Z1 records. Now the missing information is necessary to categorize the values as x,y, and z, you need to have different column names here, not twice "values", that won't work out. That's why Paul struggled and tried with MIN/MAX.

Bye, Olaf.
1
 
sukhoi35Author Commented:
Hi Paul,
Thank you very much for your efforts, much appreciated and am really grateful to you. Your code pretty much does what I was expecting and this question can be considered closed. Just checking with you if there is a way to do the same thing using a simpler and probably a longer way using self-join. The reason I am asking is the query is little too complicated for an amateur like me to modify for some of the other scenarios I am verifying. I will raise a separate question if that can be done.

Hi Olaf,
Thanks for your inputs. One column in the table stores all the input variables like I have indicated for each of the organization. For example, Org1 will have X1 to X10, Y1 to Y10. Some calculations are performed on each of the variable, division is just a simple example I have given. The calculation can be something like (X1 + Y1)/ Y2 all belonging to the same time interval. I compare the expected output (z) with the result. I am trying to create a separate query for each such formulas which I am verifying. Yes, all operands and results are in the same column, but that is the way the table has been designed.
0
 
PortletPaulfreelancerCommented:
sukhoi35 using self-joins is going to make this more complicated.

The brilliant thing about the "window functions" is they make life simpler. You just need to "play with them a bit" it won't take you long.

the key is getting an understanding of this bit: OVER( PARTITION BY .....  ORDER BY ....... )

https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql
0
 
Olaf DoschkeSoftware DeveloperCommented:
The outset is clear, sukhooi, the task to get out several values from multiple records is clear, your data itself also is identifying them now, I was just needing the real column names you still didn't tell. You seem to have come to a conclusion, so problem solved. But as said, you can't really have a  table as you laid it out in your sample-data.xlsx, no database will allow same column name (values) twice!

Paul simply assumed a "Code" column containing the value names "X1" etc and also choose to name your second values column "nValues", as Values is a reserved keyword, you could still create a table with that column name put in square brackets or double quotes.

I also wouldn't suggest ORDER BY (select 1), Code is neither used for sorting nor partitioning. So it's working on an instable basis and depends on data coming in the correct order.

Paul, this never becomes a problem, but another one shows in your fiddle in Org2, where you have two Y1 rows, these should not give a result, as a Z value is missing, still you take the second Y1 value as Z value.

A good solution needs to take more care about the Code to make use of nValus as X,Y and ZValue than just expecting them to come in order. Notice once more: Neither your partitioning nor ORDER BY (select1) care for the Code at all.

Bye, Olaf.
1
 
PortletPaulfreelancerCommented:
@Olaf,

Yes I made many assumptions (e.g. renaming values to nValues, removing spaces in column names etc.)  But I was doing so to elicit further comment and information about the requirements.

Sometimes folks have a very hard time stating their needs, and as already known,  there are still more to come.
0
 
Olaf DoschkeSoftware DeveloperCommented:
The point is, your solution is still instable and erroneous as shown because of double Y1 in Org2.

And you don't even pick out the three rows from possibly many more by eg filtering Left(Code,1) IN ('X','Y','Z'). To me the question whether there really is a column like that is still open, not only what's its name.

If you ask me, this whole problem is a candidate for pivoting the data, not for window functions.

Bye, Olaf.
0
 
sukhoi35Author Commented:
Hi Olaf, Paul,
Thanks for the comments. I am sorry, I did not realize I had mistakenly repeated the column names. I haven't yet tried out the code shared by Paul on the real table which is pretty huge and I might have to customize it for each of the scenario I am verifying. My acceptance as solution was mostly on the sample data I had given. If I do face issues, I will surely recreate a better well organized actual data from the DB and seek your help as a new question. As always, I am very grateful for all your voluntary help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.