Link to home
Start Free TrialLog in
Avatar of swendell
swendell

asked on

Need approach to structure / debug query

I have a table with events that I need to summarize (see attached for a more clearly formatted example then the one below). I have been writing a view for each event then linking the views. I can easily debug each view to narrow down any issues. On the other hand this seems grossly inefficient as a structure (and likely for performance). There is probably not only a better way to write this query (ie subquerys) but also a tool that is better suited to build and debug these more complex querys. So I am looking for some guidance Book1.xls


REF:            
Event Table                              
ID      TransactionId      EventID      Date            
1      12      1      1-Nov            
2      12      3      5-Nov            
3      12      4      8-Nov            
4      12      5      9-Nov            
1      13      1      1-Nov            
2      13      3      5-Nov            
3      13      5      8-Nov            
4      13      8      9-Nov            
                              
Desired Result                               
TransactionId      Event1      Event3      Event4      Event5      Event8
12      1-Nov      5-Nov      8-Nov      9-Nov      
13      1-Nov      5-Nov            8-Nov      9-Nov
Avatar of peter57r
peter57r
Flag of United Kingdom of Great Britain and Northern Ireland image

Your topic areas are bit varied.
What product are you trying to do this in?

It looks like s crosstab query (transform query) for a database or a standard crosstab report for Crystal.
Avatar of swendell
swendell

ASKER

Yes I am trying to flatten data as shown at bottom of my posted spreadsheet preferable at the SQL level. I am working in MySQL and MS SQL server, I also own Crystal Reports XI. What would the SQL look like to flatten out the data in my example?
SQL facilities for this differ a lot between MySQL and MS SQL, which of the 2 dbms platforms are you executing this query in? just one? (if so which)
What versions of MySQL and MS SQL?
Mysql but curious about both
If you are trying to report on this, it can be done quite easily in Crystal with the cross tab as mentioned.

Rows  - Transaction Id
Columns - Event ID
Summary - Maximum Date field

mlmcc
Well Crystal is a good tool and can flatten out data as you explained above but I am not sure how to use it to select properly. For example: In the case you presented, what if I wanted to limit the report to include rows (transactions) where EventID 3 occurred in Jan-2012? So the rows could contain many events/columns that occurred in other months as long as the date for EventID3 was in JAN-2012. That is where I guess we get into multiple pass reports? Kind of why I wanted the data flattened before I started to select.

What do you suggest?
Crystal has a selection tool.

{EventIdField} = 3 and {DateField} >= Date(2012,1,1) AND {DateField} < Date(2012,2,1)

You can create parameters so the user can select the criteria.

mlmcc
No that does not work. It returns a cross-tab with only one column for EventID 3. That is why I mentioned multi-pass.  

The first part of my request works but not the second piece:
"what if I wanted to limit the report to include rows (transactions) where EventID 3 occurred in Jan-2012? So the rows could contain many events/columns that occurred in other months as long as the date for EventID3 was in JAN-2012. "

In my original example / spreadsheet; the desired result (with all the columns) should appear if I filtered Event ID -3 as: NOV-5 to Nov-6
Ok, you have something that has many events and you want to see all information on the somethings with an event=3 in Jan 2012

Can you build a query in the database?

It will be something like

SELECT ID, otherfields
FROM YourTable
WHERE  ID IN (SELECT ID FROM YourTable WHERE Event=3 and EventDate >= #01/01/2013# and EventDate < #02/01/2013#)

mlmcc
1) You are now clearly understanding the situation and we are back to sub-querys as mentioned in my original post.
2) I can build a query in database or maybe more appropriately; I can add a command in the Crystal Database Expert.
3) What sql design tool can be used to more easily generate create these subquerys.  please see my original question:

"There is probably not only a better way to write this query (ie subquerys) but also a tool that is better suited to build and debug these more complex querys."

I do not think Crystal XI has that type of query builder?
It times out before completing:

Select
  projectcomment.projectid,
  projectcomment.commentstatusid,
  projectcomment.commentdate,
  projectcomment.projectcommentid,
  project.projectname
From
  projectcomment Inner Join
  project On project.projectid = projectcomment.projectid
Where
  projectcomment.projectid In (Select
    projectcomment.projectid
  From
    projectcomment
  Where
    projectcomment.commentstatusid = 6 And
    projectcomment.commentdate > '2013-05-11')
Hi.

Firstly I would like to say I would recommend using an SQL tool to develop queries, then utilize them in Crystal. There are so many SQL tools out there  and I know so few that I don't know what to recommend. There will be several open source products available just for MySQL for example. Ideally whatever tool you do decide on is capable of working with multiple database types. I use commercial tools not open source, but there are some good ones out there I'm pretty sure.

Right, so now your query above.

Please, please please do not use IN() when the number of records inside those brackets is an unknown. It is a very inefficient query method and timing out is the symptom of this.

You don't need to subquery in this case, just apply the where clause directly to the tables you have joined together, like this:
SELECT
      projectcomment.projectid
    , projectcomment.commentstatusid
    , projectcomment.commentdate
    , projectcomment.projectcommentid
    , project.projectname
FROM projectcomment
INNER JOIN project
    ON project.projectid = projectcomment.projectid
WHERE  projectcomment.commentstatusid = 6
            AND projectcomment.commentdate > '2013-05-11'

Open in new window

I believe this open source tool is quite popular - I have used it only once so don't know it well.
Plenty of features and supports multiple dbms platforms.

http://squirrel-sql.sourceforge.net/
PortletPaul,

Please Read the original post again.
We need to flatten a file. Your query does not, it is a standard join

Mimcc made the same mistake at first before realizing a sub query was required to flatten the file.
I was commenting on the last query (which would not flatten the structure anyway) just trying to point out that using

where something IN( select an unknown number of something from somewhere)

is a potentially very bad thing in performance terms.

OK, a MySQL 'pivot' is required (note MySQL is very different to ms sql here). Got one of those somewhere, I'll be back, hopefully, with something on this.
Ok great, 'pivot' sounds interesting , and yes this is MySQL but I am curious, what is the comparable way in ms SQL server?
for this result:
TRANSACTIONID EVENTID_1    EVENTID_3    EVENTID_4    EVENTID_5    EVENTID_8
12            Nov, 01 2001 Nov, 05 2001 Nov, 08 2001 Nov, 09 2001    (null)
13            Nov, 01 2001 Nov, 05 2001 (null)       Nov, 08 2001    Nov, 09 2001

Open in new window

the MySQL way:
SET @group_concat_max_len = 4096;

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId');

PREPARE stmt FROM @qryvar;
EXECUTE stmt;

Open in new window

see it at: http://sqlfiddle.com/#!9/b043f/2

in MSSQL there is a "PIVOT" feature, quite different to the above.
http://msdn.microsoft.com/en-AU/library/ms177410(v=sql.105).aspx
I will try that at office tommorow , could you give a brief explanation of what you are doing in that code like what you are setting, col_ref,concat, DC, etc....
mmmmm

GROUP_CONCAT is a MySQL specific function, in the above code the output from just that part is (but I have added line breaks so you can read it):

max(case when eventid=1 then `Date` else NULL end) as eventid_1,
max(case when eventid=3 then `Date` else NULL end) as eventid_3,
max(case when eventid=4 then `Date` else NULL end) as eventid_4,
max(case when eventid=5 then `Date` else NULL end) as eventid_5,
max(case when eventid=8 then `Date` else NULL end) as eventid_8

in other words it is preparing a small snippet of SQL for each (distinct) eventid

then we add some further text to this
set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId');

so the whole @qryvar now actually stores a complete SQL query that looks like this:

select TransactionId,
+ << the above set of case expressions >>
from event Group By TransactionId


select TransactionId,
max(case when eventid=1 then `Date` else NULL end) as eventid_1,
max(case when eventid=3 then `Date` else NULL end) as eventid_3,
max(case when eventid=4 then `Date` else NULL end) as eventid_4,
max(case when eventid=5 then `Date` else NULL end) as eventid_5,
max(case when eventid=8 then `Date` else NULL end) as eventid_8
from event Group By TransactionId

then this dynamically created sql query is actually executed by these 2 lines:

PREPARE stmt FROM @qryvar;
EXECUTE stmt;

suggest you lookup group_concat in the MySQL documentation.
I tried SQL Fiddle, neat tool. I caught one piece you left out

WHERE  projectcomment.commentstatusid = 6 AND projectcomment.commentdate > '2013-05-11'

meaning there should be no result row unless the row contains a date beyond 5/11/13 in EventId 6....
the where conditions may be included in the final concatenation, the harder part is determining the required columns - but the 2 need to be coordinated otherwise you may run out of string length before the query gets to be executed.

i'm guessing you are expecting me to do this??? mmm, do I have the time - I'll see
Please; if you could, above my current skill level
OK, the reason all of that is not included would be:

Unknown column 'projectcomment.commentstatusid'

so, untested, the amended code would look like this (i.e. you debug)
set @FromDate = '2013-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event 
      where projectcomment.commentstatusid = 6 AND projectcomment.commentdate > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId where projectcomment.commentstatusid = 6 AND projectcomment.commentdate > ',@FromDate);

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

if you want to amend the sqlfiddle to include the untested part you are welcome to do so
If having trouble let me know, but I think you will this where clause isn't too difficult really - it goes where it would normally go :)
getting close
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'NULL' at line 1: PREPARE stmt FROM @qryvar

set @FromDate = '2001-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_ ',eventid) 
      as col_ref from event 
      where eventid = 6 AND date > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event Group By TransactionId where eventid = 6 AND date > ' ,@FromDate);

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

see: http://sqlfiddle.com/#!9/b043f/21

there is no data to match these conditions:
where eventid = 6 AND date > @FromDate

altering the conditions to suit the small sample of data, this works, but shows you how to add a where clause
set @FromDate = '2001-05-11';

set @qryvar = (
SELECT 
    GROUP_CONCAT(col_ref)
FROM (
      select distinct
      concat(' max(case when eventid=',eventid,' then `Date` else NULL end) as eventid_',eventid) 
      as col_ref from event
      where eventid < 6 AND `Date` > @FromDate
      ) dc
  );

set @qryvar = concat('select TransactionId,', @qryvar, ' from event where eventid < 6 AND `Date` > ',@FromDate,' Group By TransactionId ');

PREPARE stmt FROM @qryvar;
EXECUTE stmt;
;

Open in new window

So close...
But I still think you need the "IN" clause because your WHERE clause limits which columns are included NOT which rows. We want to limit the rows. On a positive note; the output is formatted correctly.

In your last example  using that data if I were to limit to EventID 8 where EventID 8 occurs > 2001-11-08;  I should only return one row for transaction# 13. That row should have columns as follows

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'
EventID_8 with date '2001-11-09'

Notice the dates in the first three columns should not be filtered upon, only the date in column 8 is checked to see if > 2001-11-08 and if it meets that criteria the entire row should appear. Transaction row #12 is excluded because it does not have an Event #8 with a date > 2001-11-08

Does what I am asking make sense? Are you sure we do not need the IN clause?
It's your where clause, I didn't invent it.

But I think you may have misunderstood one small bit here.

When constructing this dynamic sql, there are 2 quite different steps.

A: we must figure out what the columns need to be,
B: we get the data (rows) that will line up to those columns

because those 2 steps need to be synchronized, "the where clause" MUST be used twice, once in both steps. If this is not done the results will be .... (let's just say wrong).

So, yes the where clause is used to work out the columns
and, the same where clause is used to limit rows
see line 14 of that first code block above, this bit applies to the rows:
' from event where eventid < 6 AND `Date` > ',@FromDate,

Does it need an IN()?

not from what you have shared with me.
this didn't makes sense to me I'm afraid:
Notice the dates in the first three columns should not be filtered upon, only the date in column 8 is checked to see if > 2001-11-08 and if it meets that criteria the entire row should appear. Transaction row #12 is excluded because it does not have an Event #8 with a date > 2001-11-08

Does what I am asking make sense? Are you sure we do not need the IN clause?
I did not propose the where clause to use. If the where clause is incorrect, what is the correct one?

The where clause provided asks for a specific eventid (6)
and 'date' > somedate

at no point have you indicated "dates in the first three columns should not be filtered upon" which to be honest I just don't follow.

I have provided you with the ability to pivot your MySQL data, it's up to you to define the correct filter I'm afraid. If the filtering syntax is an issue I can help with that - but you need to help me by translating that 3 column bit above
We are clear on the columns, all the appropriate columns are appearing properly for each transaction, so we are good on that part.

The confusion is with the rows. I want to include/exclude rows from query result based upon a criteria. That criteria will be if Event X has a date > XX/XX/XX. Does that make sense?
yes, but, that's what the existing filter does, are you heading in this direction perhaps?

WHERE
(evenidid = 6 and 'date' > 2013-05-05)
OR
(evenidid = 8 and 'date' > 2013-05-10)

i.e. the filtering is done in pairs (event & date) or (event & date)...

the point being (for me) is that the filter is still yours to make, and you should by now know where it must be used.

perhaps try developing the required filter in just a normal query (unpivotted) - then apply it to the pivot?

don't forget though that the pivot will produce a matrix that must cover the entire result (both rows and columns)
>>if I were to limit to EventID 8 where EventID 8 occurs > 2001-11-08;  I should only return one row for transaction# 13

see: http://sqlfiddle.com/#!9/b043f/24

That row should have columns as follows

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'
EventID_8 with date '2001-11-09'


NOT in the current query it won't

What is the logic here? you have EXPLICITLY asked for dates > 2001-11-08

none of these are > 2001-11-08

EventID_1 with date '2001-11-01'
EventID_3 with date '2001-11-05'
EventID_5 with date '2001-11-08'

I still don't understand I'm afraid, you will have to explain it very slowly and in detail.
I may be wrong, or explaining something that you already know, but I think he's saying that he wants to include _all_ of the events for a TransactionId, if a specific EventID occurs after a specific date.  TransactionId 13 has EventID 8 with a date after 2001-11-08, so TransactionId 13 is included, with all of its events.  TransactionId 12 does not have an entry for EventID 8 that's after that date (it has no entry for 8 at all), so TransactionId 12 is not included at all.

 Perhaps that's where the use of IN () was supposed to come in?  Include the events for a TransactionId, if that TransactionId is in the list of ones with entries for the desired EventID and date?

 I'm not sure where the desired EventID and date are coming from.  Perhaps entered as parameters at run time?

 James
Thank you James - very nice deduction, which I simply did not see (at all) - and clear to follow
Sounds quite reasonable.

may need humble pie

looks like IN() will be required :(
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
James - exactly! and yes parameters are entered at run time
I was running out of ways to explain, the output looks right now and I see the IN as well
yes, humble pie is on order - but with raspberries to make it palatable

IN() is used so that the filter locates the associated transactionid to whatever that injected parameter(s) is/are

in my defence, telling me to use IN() did not explain the required logic - there was a small part missing :)
Yes, I could have been more clear,just could not find a way
a little bit of a struggle to communicate but in the end we got there
I believe James really should get some kudos here - without that keen observation the solution was in limbo.

It is possible to request a change to points distribution - I shall leave that to your discretion but very pleased you have the solution now.

Cheers,
Paul

and thank you James!
fine with me, what is the procedural to split awarded points?
No points necessary, as far as I'm concerned.  I just helped a bit with communication.  You did all the heavy lifting.  :-)

 And you're welcome.  :-)

 James
Was there anyway to do this WITHOUT a WHERE IN clause?
mmmm, really good question - I will need some think time
Yes; a temporary table of TransactionId values, then inner join to that table.

MySQL now also has stored procedures, so you could pass a set of wanted transactionID's to a procedure and build a result using procedural code.

I'm not recommending either, just exploring possibilities.

where IN(......)
isn't my favorite solution when the bit in the middle is an unknown quantity of items. BUT, avoiding IN() might be as bad or worse than using it.

How many transactionid's do you believe you would be requesting "most of the time"?
if it's tens probably not an issue, if it's a hundred or so might become an issue, if its several hundred or in larger units then think of alternatives now.

Performance isn't just a matter of the query structure, the indexing of the tables and the hardware (disk/ram etc) and current load play a part - but IN() can be a poor performer if "abused". Can't offer you any precise numbers on this point because "it will depend" on several factors.

and X IN(1,2,3,... 1000) -- is the same as below

and ( X = 1 OR X=2 OR X=3 ..... OR X=1000) -- is the same as above