Link to home
Start Free TrialLog in
Avatar of betagirl
betagirl

asked on

Problem with subquery pulling correct data

I'm having a problem getting a subquery to produce the data I need - here's the db info (using PHPList software - I did not design this db - I'm just creating  a custom report to get data out of it - field names that should match up from table to table, don't and  I'm not sure why they designed it this way):

Here are the relevant db tables (mysql 4.1.11):
--
-- Table structure for table 'phplist_list'
--

CREATE TABLE phplist_list (
  id int(11) NOT NULL auto_increment,
  name varchar(255) NOT NULL default '',
  description text,
  entered datetime default NULL,
  listorder int(11) default NULL,
  prefix varchar(10) default NULL,
  rssfeed varchar(255) default NULL,
  modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  active tinyint(4) default NULL,
  owner int(11) default NULL,
  PRIMARY KEY  (id),
  KEY nameidx (name),
  KEY listorderidx (listorder)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=21 ;

-- --------------------------------------------------------
-- Table structure for table 'phplist_listuser'
--

CREATE TABLE phplist_listuser (
  userid int(11) NOT NULL default '0',
  listid int(11) NOT NULL default '0',
  entered datetime default NULL,
  modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (userid,listid),
  KEY userenteredidx (userid,entered),
  KEY userlistenteredidx (userid,listid,entered)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

-- --------------------------------------------------------

--
-- Table structure for table 'phplist_user_user'
--

CREATE TABLE phplist_user_user (
  id int(11) NOT NULL auto_increment,
  email varchar(255) NOT NULL default '',
  confirmed tinyint(4) default '0',
  blacklisted tinyint(4) default '0',
  bouncecount int(11) default '0',
  entered datetime default NULL,
  modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  uniqid varchar(255) default NULL,
  htmlemail tinyint(4) default '0',
  subscribepage int(11) default NULL,
  rssfrequency varchar(100) default NULL,
  `password` varchar(255) default NULL,
  passwordchanged date default NULL,
  disabled tinyint(4) default '0',
  extradata text,
  foreignkey varchar(100) default NULL,
  PRIMARY KEY  (id),
  UNIQUE KEY email (email),
  KEY foreignkey (foreignkey),
  KEY idx_phplist_user_user_uniqid (uniqid),
  KEY emailidx (email),
  KEY enteredindex (entered)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4295 ;
-- --------------------------------------------------------

Here's what I am trying to do:
For each list (in phplist_list - referenced in phplist_listuser)
  - get the total number of subscribers
  - get  the total number of confirmed subscribers
  - get the total number of unconfirmed subscribers

Here is the sql statement I'm using -
SELECT listid, COUNT(*) AS subscriber_count,
      (SELECT plu.listid, COUNT( * ) AS confirmed
      FROM phplist_user_user puu, phplist_listuser plu
      WHERE confirmed !=0
      AND plu.userid = puu.id GROUP BY listid)
FROM `phplist_listuser` plu GROUP BY listid

Here are the results:
listid       subscriber_count       confirmed_count       unconfirmed_count
1       6       1       0
2       123       0       1
3       66       1       0
4       93       1       0
5       93       1       0

The values in the confirmed_count and unconfirmed_count are incorrect.  The field 'confirmed' takes a value of 0 or 1 - but as you see, the totals in that field for users on a given list are not correctly being summed up.  

Here are the statements broken out, that do get the correct values:

GETS  TOTAL:
------------------
SELECT listid, COUNT(*) AS subscriber_count FROM `phplist_listuser` plu GROUP BY listid

GETS TOTAL CONFIRMED FOR EACH LIST:
----------------------------------------------------
SELECT plu.listid, COUNT( * ) AS confirmed
FROM phplist_user_user puu, phplist_listuser plu
WHERE confirmed !=0
AND plu.userid = puu.id
GROUP BY plu.listid

GETS TOTAL UNCONFIRMED FOR EACH LIST:
-------------------------------------------------------
SELECT plu.listid, COUNT( * ) AS unconfirmed
FROM phplist_user_user puu, phplist_listuser plu
WHERE confirmed =0
AND plu.userid = puu.id
GROUP BY plu.listid
 
I'm having a problem combining these statements to produce a single set of results that I can then display.  Any assistance is appreciated.

Avatar of dbeneit
dbeneit
Flag of Spain image

try,
SELECT listid, COUNT(*) AS subscriber_count,
      (SELECT  COUNT( * ) AS confirmed
      FROM phplist_user_user puu, phplist_listuser plu
      WHERE confirmed !=0
      AND plu.listid= puu.listid listid)
FROM `phplist_listuser` plu GROUP BY listid
excuse me , error
Ok
SELECT plu2.listid,
COUNT(*) AS subscriber_count, tmptab.confirmed, tmptab.unconfirmed
 FROM `phplist_listuser` plu2  
left join
(select listid,
sum(case when puu.confirmed =0 then 1 else 0 END CASE) as unconfirmed,
sum(case when puu.confirmed !=0 then 1 else 0 END CASE) as confirmed
 FROM `phplist_listuser` plu  
   inner join phplist_user_user puu
group by listid) as tmptab

on plu2.listid = tmptab.listid
GROUP BY plu2.listid
ASKER CERTIFIED SOLUTION
Avatar of Aleksandar Bradarić
Aleksandar Bradarić
Flag of Serbia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No need to sum, no need for sub-selects to slow you down. Just a couple of LEFT JOINs...
leannonn,

sub-select could not be slows.
Imagine that phplist_user_user have 1.000.000 of rows.
subselect return one row for distinct listid.  supposed 50 records by listid... 20000 rows...
i think that Join 20000 rows is more faster than join 1.000.000 X 1.000.000 of rows
Avatar of betagirl
betagirl

ASKER

Thanks so much! Quick work you two!  Leannonn's worked perfectly - dbeneit - your solution threw this error:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CASE ) as unconfirmed ,  sum( case when puu.confirmed != 0 then

??
In your query:

SELECT listid, COUNT(*) AS subscriber_count,
      (SELECT plu.listid, COUNT( * ) AS confirmed
      FROM phplist_user_user puu, phplist_listuser plu
      WHERE confirmed !=0
      AND plu.userid = puu.id GROUP BY listid)
FROM `phplist_listuser` plu GROUP BY listid

What is the third column? It will either throw an error or try to give you one single value, namely the first field (plu.listid) of the first row, which is your first list, since you don't link the subquery to the main query. Note that you use the same alias "plu" in the main query and in the subquery!

You probably meant:

SELECT listid, COUNT(*) AS subscriber_count,
      (SELECT COUNT( * )
      FROM phplist_user_user puu
      WHERE confirmed !=0
      AND plu.userid = puu.id GROUP BY listid) AS confirmed
FROM `phplist_listuser` plu GROUP BY listid

But it's tricky to use a count subquery in another total query; I would advise against it at the moment.

Try something simple first:

SELECT listid, Count(*), Sum(confirmed), Sum(not confirmed)
FROM phplist_listuser plu INNER JOIN phplist_user_user puu ON plu.userid = puu.id

If you want to count using suqueries, you should do so from a non-total query (for clarity and optimization)

SELECT pl.id, pl.name,
  (SELECT Count(*) FROM phplist_listuser WHERE listid=pl.id) As total_count,
  (SELECT Count(*)
    FROM phplist_listuser
    WHERE listid=pl.id
      AND userid In(SELECT id FROM phplist_user_user WHERE confirmed)
  ) As confirmed,
  (SELECT Count(*)
    FROM phplist_listuser
    WHERE listid=pl.id
      AND userid In(SELECT id FROM phplist_user_user WHERE not confirmed)
  ) As unconfirmed,
FROM phplist_list pl

Cheers!
(°v°)
betagirl,
excuse me, I haven't installed any database in my machine and I cann't test the sql, the error was the 'case' after end clause
I have seen that the accepted response was leannon, no problem. Test this sql please.

SELECT plu2.listid,
COUNT(*) AS subscriber_count, tmptab.confirmed, tmptab.unconfirmed
 FROM `phplist_listuser` plu2  
left join
(select listid,
sum(case when puu.confirmed =0 then 1 else 0 END ) as unconfirmed,
sum(case when puu.confirmed !=0 then 1 else 0 END ) as confirmed
 FROM `phplist_listuser` plu  
   inner join phplist_user_user puu
group by listid) as tmptab

on plu2.listid = tmptab.listid
GROUP BY plu2.listid
LOL. was I slow... -- (^v°)
dbeneit - hmmmm- that query returned incorrect data - for example -
listid       subscriber_count       confirmed       unconfirmed
1       6       24726       1056

list 1 only has 6 subscribers, 5 confirmed and 1 unconfirmed.  Also, it took 34.977 seconds whereas the solution used took .0011 seconds.  

To your point about processing time as the list grows, I'm honestly not too worried about it at the moment. Once our total subscriber base gets near 100,000 - we'll be moving this off our servers entirely and use a hosted solution.  PHPList is just our "starter" mailing list software (it's open source and pretty full featured).  But I appreciate the info you provided on that.
harfang - thanks for your explanation about the issues with the query I was trying - going to stick with the one from leannonn - I'll be providing additional info (based on past 7 days, past 30 days, etc) and adapting the query from leannonn was super-simple -

SELECT
  plu.listid,
  COUNT(DISTINCT plu.userid) AS subscriber_count,
  COUNT(DISTINCT p1.id) AS confirmed_count,
  COUNT(DISTINCT p2.id) AS unconfirmed_count
FROM
  `phplist_listuser` plu
  LEFT JOIN `phplist_user_user` p1 ON p1.id = plu.userid AND p1.confirmed <> 0
  LEFT JOIN `phplist_user_user` p2 ON p2.id = plu.userid AND p2.confirmed = 0
 WHERE DATE_SUB(CURDATE(),INTERVAL 7 DAY) <= plu.entered
GROUP BY plu.listid  
 
Thanks again!
umm I lost the on clause ...

SELECT plu2.listid,
COUNT(*) AS subscriber_count, tmptab.confirmed, tmptab.unconfirmed
 FROM `phplist_listuser` plu2  
left join
(select listid,
sum(case when puu.confirmed =0 then 1 else 0 END ) as unconfirmed,
sum(case when puu.confirmed !=0 then 1 else 0 END ) as confirmed
 FROM `phplist_listuser` plu  
   inner join phplist_user_user puu
on   plu.userid = puu.id
group by listid) as tmptab
on plu2.listid = tmptab.listid
GROUP BY plu2.listid
dbeneit - works great! and took only .1133 secs :-)  
> i think that Join 20000 rows is more faster than join 1.000.000 X 1.000.000 of rows

You're not reading the SQL correctly. The query does not join all the rows in two tables. Read the ON clause for both ad you'll see that the joined tables are no larger than the number of confirmed/unconfirmed subscribers per list.