Solved

Need approach to structure / debug query

Posted on 2013-05-12
45
368 Views
Last Modified: 2013-05-20
I have a table with events that I need to summarize (see attached for a more clearly formatted example then the one below). I have been writing a view for each event then linking the views. I can easily debug each view to narrow down any issues. On the other hand this seems grossly inefficient as a structure (and likely for performance). There is probably not only a better way to write this query (ie subquerys) but also a tool that is better suited to build and debug these more complex querys. So I am looking for some guidance Book1.xls


REF:            
Event Table                              
ID      TransactionId      EventID      Date            
1      12      1      1-Nov            
2      12      3      5-Nov            
3      12      4      8-Nov            
4      12      5      9-Nov            
1      13      1      1-Nov            
2      13      3      5-Nov            
3      13      5      8-Nov            
4      13      8      9-Nov            
                              
Desired Result                               
TransactionId      Event1      Event3      Event4      Event5      Event8
12      1-Nov      5-Nov      8-Nov      9-Nov      
13      1-Nov      5-Nov            8-Nov      9-Nov
0
Comment
Question by:swendell
  • 20
  • 19
  • 3
  • +2
45 Comments
 
LVL 77

Expert Comment

by:peter57r
ID: 39159648
Your topic areas are bit varied.
What product are you trying to do this in?

It looks like s crosstab query (transform query) for a database or a standard crosstab report for Crystal.
0
 

Author Comment

by:swendell
ID: 39159737
Yes I am trying to flatten data as shown at bottom of my posted spreadsheet preferable at the SQL level. I am working in MySQL and MS SQL server, I also own Crystal Reports XI. What would the SQL look like to flatten out the data in my example?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39160114
SQL facilities for this differ a lot between MySQL and MS SQL, which of the 2 dbms platforms are you executing this query in? just one? (if so which)
What versions of MySQL and MS SQL?
0
 

Author Comment

by:swendell
ID: 39160178
Mysql but curious about both
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 39160351
If you are trying to report on this, it can be done quite easily in Crystal with the cross tab as mentioned.

Rows  - Transaction Id
Columns - Event ID
Summary - Maximum Date field

mlmcc
0
 

Author Comment

by:swendell
ID: 39165438
Well Crystal is a good tool and can flatten out data as you explained above but I am not sure how to use it to select properly. For example: In the case you presented, what if I wanted to limit the report to include rows (transactions) where EventID 3 occurred in Jan-2012? So the rows could contain many events/columns that occurred in other months as long as the date for EventID3 was in JAN-2012. That is where I guess we get into multiple pass reports? Kind of why I wanted the data flattened before I started to select.

What do you suggest?
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 39165464
Crystal has a selection tool.

{EventIdField} = 3 and {DateField} >= Date(2012,1,1) AND {DateField} < Date(2012,2,1)

You can create parameters so the user can select the criteria.

mlmcc
0
 

Author Comment

by:swendell
ID: 39165711
No that does not work. It returns a cross-tab with only one column for EventID 3. That is why I mentioned multi-pass.  

The first part of my request works but not the second piece:
"what if I wanted to limit the report to include rows (transactions) where EventID 3 occurred in Jan-2012? So the rows could contain many events/columns that occurred in other months as long as the date for EventID3 was in JAN-2012. "

In my original example / spreadsheet; the desired result (with all the columns) should appear if I filtered Event ID -3 as: NOV-5 to Nov-6
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 39165753
Ok, you have something that has many events and you want to see all information on the somethings with an event=3 in Jan 2012

Can you build a query in the database?

It will be something like

SELECT ID, otherfields
FROM YourTable
WHERE  ID IN (SELECT ID FROM YourTable WHERE Event=3 and EventDate >= #01/01/2013# and EventDate < #02/01/2013#)

mlmcc
0
 

Author Comment

by:swendell
ID: 39166010
1) You are now clearly understanding the situation and we are back to sub-querys as mentioned in my original post.
2) I can build a query in database or maybe more appropriately; I can add a command in the Crystal Database Expert.
3) What sql design tool can be used to more easily generate create these subquerys.  please see my original question:

"There is probably not only a better way to write this query (ie subquerys) but also a tool that is better suited to build and debug these more complex querys."

I do not think Crystal XI has that type of query builder?
0
 

Author Comment

by:swendell
ID: 39166661
It times out before completing:

Select
  projectcomment.projectid,
  projectcomment.commentstatusid,
  projectcomment.commentdate,
  projectcomment.projectcommentid,
  project.projectname
From
  projectcomment Inner Join
  project On project.projectid = projectcomment.projectid
Where
  projectcomment.projectid In (Select
    projectcomment.projectid
  From
    projectcomment
  Where
    projectcomment.commentstatusid = 6 And
    projectcomment.commentdate > '2013-05-11')
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39166726
Hi.

Firstly I would like to say I would recommend using an SQL tool to develop queries, then utilize them in Crystal. There are so many SQL tools out there  and I know so few that I don't know what to recommend. There will be several open source products available just for MySQL for example. Ideally whatever tool you do decide on is capable of working with multiple database types. I use commercial tools not open source, but there are some good ones out there I'm pretty sure.

Right, so now your query above.

Please, please please do not use IN() when the number of records inside those brackets is an unknown. It is a very inefficient query method and timing out is the symptom of this.

You don't need to subquery in this case, just apply the where clause directly to the tables you have joined together, like this:
SELECT
      projectcomment.projectid
    , projectcomment.commentstatusid
    , projectcomment.commentdate
    , projectcomment.projectcommentid
    , project.projectname
FROM projectcomment
INNER JOIN project
    ON project.projectid = projectcomment.projectid
WHERE  projectcomment.commentstatusid = 6
            AND projectcomment.commentdate > '2013-05-11'

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39166738
I believe this open source tool is quite popular - I have used it only once so don't know it well.
Plenty of features and supports multiple dbms platforms.

http://squirrel-sql.sourceforge.net/
0
 

Author Comment

by:swendell
ID: 39166910
PortletPaul,

Please Read the original post again.
We need to flatten a file. Your query does not, it is a standard join

Mimcc made the same mistake at first before realizing a sub query was required to flatten the file.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39166935
I was commenting on the last query (which would not flatten the structure anyway) just trying to point out that using

where something IN( select an unknown number of something from somewhere)

is a potentially very bad thing in performance terms.

OK, a MySQL 'pivot' is required (note MySQL is very different to ms sql here). Got one of those somewhere, I'll be back, hopefully, with something on this.
0
 

Author Comment

by:swendell
ID: 39166941
Ok great, 'pivot' sounds interesting , and yes this is MySQL but I am curious, what is the comparable way in ms SQL server?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39166952
for this result:
TRANSACTIONID EVENTID_1    EVENTID_3    EVENTID_4    EVENTID_5    EVENTID_8
12            Nov, 01 2001 Nov, 05 2001 Nov, 08 2001 Nov, 09 2001    (null)
13            Nov, 01 2001 Nov, 05 2001 (null)       Nov, 08 2001    Nov, 09 2001

Open in new window

the MySQL way:
SET @group_concat_max_len = 4096;

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId');

PREPARE stmt FROM @qryvar;
EXECUTE stmt;

Open in new window

see it at: http://sqlfiddle.com/#!9/b043f/2

in MSSQL there is a "PIVOT" feature, quite different to the above.
http://msdn.microsoft.com/en-AU/library/ms177410(v=sql.105).aspx
0
 

Author Comment

by:swendell
ID: 39166963
I will try that at office tommorow , could you give a brief explanation of what you are doing in that code like what you are setting, col_ref,concat, DC, etc....
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39166976
mmmmm

GROUP_CONCAT is a MySQL specific function, in the above code the output from just that part is (but I have added line breaks so you can read it):

max(case when eventid=1 then `Date` else NULL end) as eventid_1,
max(case when eventid=3 then `Date` else NULL end) as eventid_3,
max(case when eventid=4 then `Date` else NULL end) as eventid_4,
max(case when eventid=5 then `Date` else NULL end) as eventid_5,
max(case when eventid=8 then `Date` else NULL end) as eventid_8

in other words it is preparing a small snippet of SQL for each (distinct) eventid

then we add some further text to this
set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId');

so the whole @qryvar now actually stores a complete SQL query that looks like this:

select TransactionId,
+ << the above set of case expressions >>
from event Group By TransactionId


select TransactionId,
max(case when eventid=1 then `Date` else NULL end) as eventid_1,
max(case when eventid=3 then `Date` else NULL end) as eventid_3,
max(case when eventid=4 then `Date` else NULL end) as eventid_4,
max(case when eventid=5 then `Date` else NULL end) as eventid_5,
max(case when eventid=8 then `Date` else NULL end) as eventid_8
from event Group By TransactionId

then this dynamically created sql query is actually executed by these 2 lines:

PREPARE stmt FROM @qryvar;
EXECUTE stmt;

suggest you lookup group_concat in the MySQL documentation.
0
 

Author Comment

by:swendell
ID: 39167661
I tried SQL Fiddle, neat tool. I caught one piece you left out

WHERE  projectcomment.commentstatusid = 6 AND projectcomment.commentdate > '2013-05-11'

meaning there should be no result row unless the row contains a date beyond 5/11/13 in EventId 6....
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39167740
the where conditions may be included in the final concatenation, the harder part is determining the required columns - but the 2 need to be coordinated otherwise you may run out of string length before the query gets to be executed.

i'm guessing you are expecting me to do this??? mmm, do I have the time - I'll see
0
 

Author Comment

by:swendell
ID: 39167771
Please; if you could, above my current skill level
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 48

Expert Comment

by:PortletPaul
ID: 39167779
OK, the reason all of that is not included would be:

Unknown column 'projectcomment.commentstatusid'

so, untested, the amended code would look like this (i.e. you debug)
set @FromDate = '2013-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event 
      where projectcomment.commentstatusid = 6 AND projectcomment.commentdate > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId where projectcomment.commentstatusid = 6 AND projectcomment.commentdate > ',@FromDate);

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

if you want to amend the sqlfiddle to include the untested part you are welcome to do so
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39168065
If having trouble let me know, but I think you will this where clause isn't too difficult really - it goes where it would normally go :)
0
 

Author Comment

by:swendell
ID: 39168574
getting close
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'NULL' at line 1: PREPARE stmt FROM @qryvar

set @FromDate = '2001-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_ ',eventid) 
      as col_ref from event 
      where eventid = 6 AND date > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId where eventid = 6 AND date > ' ,@FromDate);

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39169818
see: http://sqlfiddle.com/#!9/b043f/21

there is no data to match these conditions:
where eventid = 6 AND date > @FromDate

altering the conditions to suit the small sample of data, this works, but shows you how to add a where clause
set @FromDate = '2001-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event
      where eventid < 6 AND `Date` > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event where eventid < 6 AND `Date` > ',@FromDate,' Group By TransactionId ');

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

0
 

Author Comment

by:swendell
ID: 39173331
So close...
But I still think you need the "IN" clause because your WHERE clause limits which columns are included NOT which rows. We want to limit the rows. On a positive note; the output is formatted correctly.

In your last example  using that data if I were to limit to EventID 8 where EventID 8 occurs > 2001-11-08;  I should only return one row for transaction# 13. That row should have columns as follows

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'
EventID_8 with date '2001-11-09'

Notice the dates in the first three columns should not be filtered upon, only the date in column 8 is checked to see if > 2001-11-08 and if it meets that criteria the entire row should appear. Transaction row #12 is excluded because it does not have an Event #8 with a date > 2001-11-08

Does what I am asking make sense? Are you sure we do not need the IN clause?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39173404
It's your where clause, I didn't invent it.

But I think you may have misunderstood one small bit here.

When constructing this dynamic sql, there are 2 quite different steps.

A: we must figure out what the columns need to be,
B: we get the data (rows) that will line up to those columns

because those 2 steps need to be synchronized, "the where clause" MUST be used twice, once in both steps. If this is not done the results will be .... (let's just say wrong).

So, yes the where clause is used to work out the columns
and, the same where clause is used to limit rows
see line 14 of that first code block above, this bit applies to the rows:
' from event where eventid < 6 AND `Date` > ',@FromDate,

Does it need an IN()?

not from what you have shared with me.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39173415
this didn't makes sense to me I'm afraid:
Notice the dates in the first three columns should not be filtered upon, only the date in column 8 is checked to see if > 2001-11-08 and if it meets that criteria the entire row should appear. Transaction row #12 is excluded because it does not have an Event #8 with a date > 2001-11-08

Does what I am asking make sense? Are you sure we do not need the IN clause?
I did not propose the where clause to use. If the where clause is incorrect, what is the correct one?

The where clause provided asks for a specific eventid (6)
and 'date' > somedate

at no point have you indicated "dates in the first three columns should not be filtered upon" which to be honest I just don't follow.

I have provided you with the ability to pivot your MySQL data, it's up to you to define the correct filter I'm afraid. If the filtering syntax is an issue I can help with that - but you need to help me by translating that 3 column bit above
0
 

Author Comment

by:swendell
ID: 39173558
We are clear on the columns, all the appropriate columns are appearing properly for each transaction, so we are good on that part.

The confusion is with the rows. I want to include/exclude rows from query result based upon a criteria. That criteria will be if Event X has a date > XX/XX/XX. Does that make sense?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39173599
yes, but, that's what the existing filter does, are you heading in this direction perhaps?

WHERE
(evenidid = 6 and 'date' > 2013-05-05)
OR
(evenidid = 8 and 'date' > 2013-05-10)

i.e. the filtering is done in pairs (event & date) or (event & date)...

the point being (for me) is that the filter is still yours to make, and you should by now know where it must be used.

perhaps try developing the required filter in just a normal query (unpivotted) - then apply it to the pivot?

don't forget though that the pivot will produce a matrix that must cover the entire result (both rows and columns)
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39173617
>>if I were to limit to EventID 8 where EventID 8 occurs > 2001-11-08;  I should only return one row for transaction# 13

see: http://sqlfiddle.com/#!9/b043f/24

That row should have columns as follows

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'
EventID_8 with date '2001-11-09'


NOT in the current query it won't

What is the logic here? you have EXPLICITLY asked for dates > 2001-11-08

none of these are > 2001-11-08

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'

I still don't understand I'm afraid, you will have to explain it very slowly and in detail.
0
 
LVL 34

Expert Comment

by:James0628
ID: 39174112
I may be wrong, or explaining something that you already know, but I think he's saying that he wants to include _all_ of the events for a TransactionId, if a specific EventID occurs after a specific date.  TransactionId 13 has EventID 8 with a date after 2001-11-08, so TransactionId 13 is included, with all of its events.  TransactionId 12 does not have an entry for EventID 8 that's after that date (it has no entry for 8 at all), so TransactionId 12 is not included at all.

 Perhaps that's where the use of IN () was supposed to come in?  Include the events for a TransactionId, if that TransactionId is in the list of ones with entries for the desired EventID and date?

 I'm not sure where the desired EventID and date are coming from.  Perhaps entered as parameters at run time?

 James
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39174427
Thank you James - very nice deduction, which I simply did not see (at all) - and clear to follow
Sounds quite reasonable.

may need humble pie

looks like IN() will be required :(
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 39174468
TRANSACTIONID    EVENTID_1    EVENTID_3    EVENTID_5    EVENTID_8
13    Nov, 01 2001    Nov, 05 2001    Nov, 08 2001    Nov, 09 2001

Open in new window

humble pie, but with ice cream please
http://sqlfiddle.com/#!9/b043f/39
set @FromDate = '2001-11-08';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
          concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
          as col_ref 
      from event
      where transactionid IN (select distinct transactionid from Event where eventid = 8 AND `Date` > @FromDate)
      ) dc
  );

set @qryvar = concat('select TransactionId,'
                     , @qryvar
                     , ' from event where transactionid IN (select distinct transactionid from Event where eventid = 8 AND `Date` >  '
                     , @FromDate
                     , ')'
                     , ' Group By TransactionId '
                    );

PREPARE stmt FROM @qryvar;
EXECUTE stmt;

Open in new window

0
 

Author Comment

by:swendell
ID: 39174539
James - exactly! and yes parameters are entered at run time
I was running out of ways to explain, the output looks right now and I see the IN as well
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39174566
yes, humble pie is on order - but with raspberries to make it palatable

IN() is used so that the filter locates the associated transactionid to whatever that injected parameter(s) is/are

in my defence, telling me to use IN() did not explain the required logic - there was a small part missing :)
0
 

Author Comment

by:swendell
ID: 39174849
Yes, I could have been more clear,just could not find a way
0
 

Author Closing Comment

by:swendell
ID: 39174852
a little bit of a struggle to communicate but in the end we got there
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39174871
I believe James really should get some kudos here - without that keen observation the solution was in limbo.

It is possible to request a change to points distribution - I shall leave that to your discretion but very pleased you have the solution now.

Cheers,
Paul

and thank you James!
0
 

Author Comment

by:swendell
ID: 39175207
fine with me, what is the procedural to split awarded points?
0
 
LVL 34

Expert Comment

by:James0628
ID: 39175214
No points necessary, as far as I'm concerned.  I just helped a bit with communication.  You did all the heavy lifting.  :-)

 And you're welcome.  :-)

 James
0
 

Author Comment

by:swendell
ID: 39180889
Was there anyway to do this WITHOUT a WHERE IN clause?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39180913
mmmm, really good question - I will need some think time
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39181035
Yes; a temporary table of TransactionId values, then inner join to that table.

MySQL now also has stored procedures, so you could pass a set of wanted transactionID's to a procedure and build a result using procedural code.

I'm not recommending either, just exploring possibilities.

where IN(......)
isn't my favorite solution when the bit in the middle is an unknown quantity of items. BUT, avoiding IN() might be as bad or worse than using it.

How many transactionid's do you believe you would be requesting "most of the time"?
if it's tens probably not an issue, if it's a hundred or so might become an issue, if its several hundred or in larger units then think of alternatives now.

Performance isn't just a matter of the query structure, the indexing of the tables and the hardware (disk/ram etc) and current load play a part - but IN() can be a poor performer if "abused". Can't offer you any precise numbers on this point because "it will depend" on several factors.

and X IN(1,2,3,... 1000) -- is the same as below

and ( X = 1 OR X=2 OR X=3 ..... OR X=1000) -- is the same as above
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Join & Write a Comment

Creating and Managing Databases with phpMyAdmin in cPanel.
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now