sukhoi35
asked on
Suggestions for writing a query
Hello Experts,
Need your suggestion please as my knowledge with SQL is basic. I have a table having several thousands of rows. Each set of rows have a common id for example, the first 1000 rows belong to dataset no. 1 and so all have value 1 in the id column. Within these 1000 rows, I need to find the calculation of multiple two rows having same time stamp and compare with another row having similar time stamp.
For example, pick the value contained in the row having field X and pick the value contained in the row having field y, divide them and compare it with the value in the row z by display it as output similar to the columns below. I am not sure if Joins would be the right way as they are all part of the same table.
id | X Value | Y Value | Division Result | Z Value | Time Stamp
1 20 10 2 2 23:59:59
Need your suggestion please as my knowledge with SQL is basic. I have a table having several thousands of rows. Each set of rows have a common id for example, the first 1000 rows belong to dataset no. 1 and so all have value 1 in the id column. Within these 1000 rows, I need to find the calculation of multiple two rows having same time stamp and compare with another row having similar time stamp.
For example, pick the value contained in the row having field X and pick the value contained in the row having field y, divide them and compare it with the value in the row z by display it as output similar to the columns below. I am not sure if Joins would be the right way as they are all part of the same table.
id | X Value | Y Value | Division Result | Z Value | Time Stamp
1 20 10 2 2 23:59:59
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Thank you experts for such a quick response, and apologies for not being clear here. I have tried to simulate data (as I can't paste the real data).
In the attached image, I have a table having rows grouped by a number. Here, I will pick y value of a particular time interval and divide it by the x value of the matching time interval and compare it with the z value of the matching time interval. Z is calculated by the calculation server. If you see, the z values do not match which helps me in identifying easily that there is a problem. Please note each group number can contain several organizations and each organization can have only one value of x, y and z i.e. only one set of x1, y1 and z1, one set of x2, y2 and z2.....so on. The date is of type 'datetime'
Please let me know if further details are required.
In the attached image, I have a table having rows grouped by a number. Here, I will pick y value of a particular time interval and divide it by the x value of the matching time interval and compare it with the z value of the matching time interval. Z is calculated by the calculation server. If you see, the z values do not match which helps me in identifying easily that there is a problem. Please note each group number can contain several organizations and each organization can have only one value of x, y and z i.e. only one set of x1, y1 and z1, one set of x2, y2 and z2.....so on. The date is of type 'datetime'
Please let me know if further details are required.
looks a lot like this:
SELECT
[Group No]
, Organization
, [Time Interval]
, MIN (VALUES) [X Value]
, MAX (VALUES) [Y Value]
, MAX (VALUES) / MIN (VALUES) [Z Value]
FROM that_table
GROUP BY
[Group No]
, Organization
, [Time Interval]
sorry missed the [Z Value] was a column
SELECT
[Group No]
, Organization
, [Time Interval]
, MIN (VALUES) [X Value]
, MAX (VALUES) [Y Value]
, MAX (VALUES) / MIN (VALUES) [Result (Y/X)]
, [Z Value]
FROM that_table
GROUP BY
[Group No]
, Organization
, [Time Interval]
, [Z Value]
ASKER
Hi PortletPaul,
Thanks for your response. Please note the values taken are just samples, the numbers can just be anything. I think, min and max would not be the right choice. X can be 13.5 and y can be 2, it is of type decimal. Is there any other way to identify the x value and y value please?
Thanks for your response. Please note the values taken are just samples, the numbers can just be anything. I think, min and max would not be the right choice. X can be 13.5 and y can be 2, it is of type decimal. Is there any other way to identify the x value and y value please?
ASKER
We need not consider all the time intervals. If we just consider the first time interval for each organization, should be sufficient for display in the output table.
I responded to what I see in that image. If you don't reveal details it is just a little impossible for us to guess what we don't see.
>>"Is there any other way to identify the x value and y value please?"
is "x" the "first value" and is y the "last" value
The real point here is you have to tell me what the logic is, all can do is guess
IF you provide a sample of data in a reusable format then perhaps we cold write a query. (nb:"data" is not an "image of data")
A simple copy/paste of the "input" section of your worksheet into a "code block" would help.
But note the the sample needs to be "representative", that is, you need to choose your sample carefully. But, the sample still needs to be relatively small. Don't include anything that might be private or sensitive.
If the expected output you have shown so far doesn't cover all the items of the sample then please also provide the new expected output.
>>"Is there any other way to identify the x value and y value please?"
is "x" the "first value" and is y the "last" value
The real point here is you have to tell me what the logic is, all can do is guess
IF you provide a sample of data in a reusable format then perhaps we cold write a query. (nb:"data" is not an "image of data")
A simple copy/paste of the "input" section of your worksheet into a "code block" would help.
But note the the sample needs to be "representative", that is, you need to choose your sample carefully. But, the sample still needs to be relatively small. Don't include anything that might be private or sensitive.
If the expected output you have shown so far doesn't cover all the items of the sample then please also provide the new expected output.
so for the "time interval" of 08:05 in your image, and the FIRST value = X, and the SECOND value = Y ? (please onfirm)
and we need only consider first and second for each "time interval"? (please confirm)
and we need only consider first and second for each "time interval"? (please confirm)
ASKER
Sorry Paul, because I am working on a remote machine from which I can't copy data, I have recreated it. I have tried to arrange for a sample excel file with some random meaningful numbers with column names morphed.In my above statement, I meant something like this:
- Org1 has x1, y1, z1 data for time intervals say starting at 8.00, 8.05, then 8.15, 8.20..... (for each x1, y1 and z1)
- Org2 has x1, y1, z1 data for similar time intervals like above
- Org3 has x1, y1, z1 data for similar time intervals like above
...and so on.
If I can just get the above mentioned output for just the top row when grouped as time interval for each of the Organization it would help me. I do have many other calculations that I will have to perform, but having this simple division as the base, I will modify it accordingly and do further operations. i.e. the output contains contains only the first row which will be the first time interval for each of the organizations. Also, the variable names x1, y1 and z1 can be considered always the same for each of the orgs. (please ignore x2, y2 and z2, etc in my previous sample).
sample-data.xlsx
- Org1 has x1, y1, z1 data for time intervals say starting at 8.00, 8.05, then 8.15, 8.20..... (for each x1, y1 and z1)
- Org2 has x1, y1, z1 data for similar time intervals like above
- Org3 has x1, y1, z1 data for similar time intervals like above
...and so on.
If I can just get the above mentioned output for just the top row when grouped as time interval for each of the Organization it would help me. I do have many other calculations that I will have to perform, but having this simple division as the base, I will modify it accordingly and do further operations. i.e. the output contains contains only the first row which will be the first time interval for each of the organizations. Also, the variable names x1, y1 and z1 can be considered always the same for each of the orgs. (please ignore x2, y2 and z2, etc in my previous sample).
sample-data.xlsx
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Hi Paul,
Thank you very much for your efforts, much appreciated and am really grateful to you. Your code pretty much does what I was expecting and this question can be considered closed. Just checking with you if there is a way to do the same thing using a simpler and probably a longer way using self-join. The reason I am asking is the query is little too complicated for an amateur like me to modify for some of the other scenarios I am verifying. I will raise a separate question if that can be done.
Hi Olaf,
Thanks for your inputs. One column in the table stores all the input variables like I have indicated for each of the organization. For example, Org1 will have X1 to X10, Y1 to Y10. Some calculations are performed on each of the variable, division is just a simple example I have given. The calculation can be something like (X1 + Y1)/ Y2 all belonging to the same time interval. I compare the expected output (z) with the result. I am trying to create a separate query for each such formulas which I am verifying. Yes, all operands and results are in the same column, but that is the way the table has been designed.
Thank you very much for your efforts, much appreciated and am really grateful to you. Your code pretty much does what I was expecting and this question can be considered closed. Just checking with you if there is a way to do the same thing using a simpler and probably a longer way using self-join. The reason I am asking is the query is little too complicated for an amateur like me to modify for some of the other scenarios I am verifying. I will raise a separate question if that can be done.
Hi Olaf,
Thanks for your inputs. One column in the table stores all the input variables like I have indicated for each of the organization. For example, Org1 will have X1 to X10, Y1 to Y10. Some calculations are performed on each of the variable, division is just a simple example I have given. The calculation can be something like (X1 + Y1)/ Y2 all belonging to the same time interval. I compare the expected output (z) with the result. I am trying to create a separate query for each such formulas which I am verifying. Yes, all operands and results are in the same column, but that is the way the table has been designed.
sukhoi35 using self-joins is going to make this more complicated.
The brilliant thing about the "window functions" is they make life simpler. You just need to "play with them a bit" it won't take you long.
the key is getting an understanding of this bit: OVER( PARTITION BY ..... ORDER BY ....... )
https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql
The brilliant thing about the "window functions" is they make life simpler. You just need to "play with them a bit" it won't take you long.
the key is getting an understanding of this bit: OVER( PARTITION BY ..... ORDER BY ....... )
https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql
The outset is clear, sukhooi, the task to get out several values from multiple records is clear, your data itself also is identifying them now, I was just needing the real column names you still didn't tell. You seem to have come to a conclusion, so problem solved. But as said, you can't really have a table as you laid it out in your sample-data.xlsx, no database will allow same column name (values) twice!
Paul simply assumed a "Code" column containing the value names "X1" etc and also choose to name your second values column "nValues", as Values is a reserved keyword, you could still create a table with that column name put in square brackets or double quotes.
I also wouldn't suggest ORDER BY (select 1), Code is neither used for sorting nor partitioning. So it's working on an instable basis and depends on data coming in the correct order.
Paul, this never becomes a problem, but another one shows in your fiddle in Org2, where you have two Y1 rows, these should not give a result, as a Z value is missing, still you take the second Y1 value as Z value.
A good solution needs to take more care about the Code to make use of nValus as X,Y and ZValue than just expecting them to come in order. Notice once more: Neither your partitioning nor ORDER BY (select1) care for the Code at all.
Bye, Olaf.
Paul simply assumed a "Code" column containing the value names "X1" etc and also choose to name your second values column "nValues", as Values is a reserved keyword, you could still create a table with that column name put in square brackets or double quotes.
I also wouldn't suggest ORDER BY (select 1), Code is neither used for sorting nor partitioning. So it's working on an instable basis and depends on data coming in the correct order.
Paul, this never becomes a problem, but another one shows in your fiddle in Org2, where you have two Y1 rows, these should not give a result, as a Z value is missing, still you take the second Y1 value as Z value.
A good solution needs to take more care about the Code to make use of nValus as X,Y and ZValue than just expecting them to come in order. Notice once more: Neither your partitioning nor ORDER BY (select1) care for the Code at all.
Bye, Olaf.
@Olaf,
Yes I made many assumptions (e.g. renaming values to nValues, removing spaces in column names etc.) But I was doing so to elicit further comment and information about the requirements.
Sometimes folks have a very hard time stating their needs, and as already known, there are still more to come.
Yes I made many assumptions (e.g. renaming values to nValues, removing spaces in column names etc.) But I was doing so to elicit further comment and information about the requirements.
Sometimes folks have a very hard time stating their needs, and as already known, there are still more to come.
The point is, your solution is still instable and erroneous as shown because of double Y1 in Org2.
And you don't even pick out the three rows from possibly many more by eg filtering Left(Code,1) IN ('X','Y','Z'). To me the question whether there really is a column like that is still open, not only what's its name.
If you ask me, this whole problem is a candidate for pivoting the data, not for window functions.
Bye, Olaf.
And you don't even pick out the three rows from possibly many more by eg filtering Left(Code,1) IN ('X','Y','Z'). To me the question whether there really is a column like that is still open, not only what's its name.
If you ask me, this whole problem is a candidate for pivoting the data, not for window functions.
Bye, Olaf.
ASKER
Hi Olaf, Paul,
Thanks for the comments. I am sorry, I did not realize I had mistakenly repeated the column names. I haven't yet tried out the code shared by Paul on the real table which is pretty huge and I might have to customize it for each of the scenario I am verifying. My acceptance as solution was mostly on the sample data I had given. If I do face issues, I will surely recreate a better well organized actual data from the DB and seek your help as a new question. As always, I am very grateful for all your voluntary help.
Thanks for the comments. I am sorry, I did not realize I had mistakenly repeated the column names. I haven't yet tried out the code shared by Paul on the real table which is pretty huge and I might have to customize it for each of the scenario I am verifying. My acceptance as solution was mostly on the sample data I had given. If I do face issues, I will surely recreate a better well organized actual data from the DB and seek your help as a new question. As always, I am very grateful for all your voluntary help.
I regret that I don't understand by the words alone.
Oh! and you will need to define what "similar time stamp" actually means. (+/- 1 minute? +/- 15 minutes? less? more?)
Oh! (again) will also need to know what the actual "data type" of that column "time stamp". is it datetime? time? (hopefully not varchar!)