Avatar of timbrigham
timbrighamFlag for United States of America

asked on 

Nested / join syntax

I'm having a little trouble designing a nested or join query. I'm referencing two tables, user_activity and firewall_traffic. I'm running MySQL 5.0.22
Both are indexed on the fields I'm looking for, and both are varchar(20) containing the same data.
The tables in question are on the large side (to me anyways), with 2 and 4 million rows, respectively.
I've included my unsuccessful queries.

select * from firewall_traffic where date='2008-07-16' and src in ( select addr from user_activity where date='2008-07-16' and addr != '' and addr != '-' and user='testuser' group by addr );
 
select * from firewall_traffic where date='2008-07-16' and src = any( select addr from user_activity where date='2008-07-16' and addr != '' and addr != '-' and user='testuser' group by addr );
 
select * from firewall_traffic inner (or straight or just plain)  join user_activity on firewall_traffic.src = user_activity.addr
where date='2008-07-16' and user='testuser'

Open in new window

DatabasesMySQL ServerSQL

Avatar of undefined
Last Comment
timbrigham
SOLUTION
Avatar of carazuul
carazuul

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of timbrigham
timbrigham
Flag of United States of America image

ASKER

carazuul, I tried your suggestion. Still no go. I've let the query process for 10 min without finishing.
Might it just be the size of the table? To be honest, I've never worked with tables more than 15,000 rows or so before. The server should be plenty powerful enough - it's got 4 gigs of RAM and two dual core 3GHz processors.

Here's the create table results:

CREATE TABLE `firewall_traffic` (
  `host` varchar(32) default NULL,
  `date` date default NULL,
  `time` time default NULL,
  `src` varchar(20) default NULL,
  `dst` varchar(20) default NULL,
  `dstname` varchar(50) default NULL,
  `msg` text,
  `seq` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`seq`),
  KEY `host` (`host`),
  KEY `date` (`date`,`time`),
  KEY `src` (`src`,`dst`),
  KEY `dstname` (`dstname`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `user_activity` (
  `host` varchar(32) default NULL,
  `date` date default NULL,
  `time` time default NULL,
  `evtlog` varchar(32) default NULL,
  `evtid` int(11) default NULL,
  `source` varchar(32) default NULL,
  `user` varchar(50) default NULL,
  `type` varchar(32) default NULL,
  `category` varchar(32) default NULL,
  `msg` text,
  `addr` varchar(20) default NULL,
  `seq` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`seq`),
  KEY `host` (`host`),
  KEY `date` (`date`,`time`),
  KEY `evtlog` (`evtlog`),
  KEY `evtid` (`evtid`),
  KEY `source` (`source`),
  KEY `type` (`type`),
  KEY `category` (`category`),
  KEY `user` (`user`),
  KEY `addr` (`addr`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

And my.cnf:
[mysqld]
datadir=/data
socket=/var/lib/mysql/mysql.sock

[mysql.server]
user=mysql
basedir=/var/lib

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
SOLUTION
Avatar of FrivolousSam
FrivolousSam

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of timbrigham
timbrigham
Flag of United States of America image

ASKER

FrivolousSam:
The differences in the not queries were just copy and paste errors - I had tried about 20 different variations when I posted my question, and I just selected the wrong one.
I've tried executing the query provided, including the limit clause. The only thing I changed was the date line to firewall_traffic.date='2008-07-16' (I was getting ambiguity errors), plus activating the limit line.

The speed problem persists. With a limit 1, the query took over 5 minutes to execute. That's definitely better, but it's still longer than I'd expect.  

I've been thinking about your comment about the indexes. All the columns you mentioned are indexed, but a few have multiple elements in the their indexes; firewall traffic `date` (`date`,`time`),  `src` (`src`,`dst`, and user `date` (`date`,`time`), to be precise. Do multi part indexes hurt anything here?
Avatar of timbrigham
timbrigham
Flag of United States of America image

ASKER

I've found a working solution to this problem, albeit not quite what I was expecting.
I went back to thinking about the pure volume of data being joined. Since a basic Cartesian join on my data set would be  8.0 × 10^12 rows, it makes sense that I needed  to reduce the original data set to something a little more manageable. I created a couple temporary tables to store the exact ranges I'm looking for, which drastically sped up the process.
There are only about 4,700,000 entries to cross reference with the new tables.
The total time is only about 45 - 60 seconds now, including creating the temporary tables.
create temporary table usr select user,addr from user_activity where date='2008-07-16' and user = 'testuser' and addr != '' and addr != '-' group by user,addr;
And one from the firewall log
create temporary table fw select * from firewall_traffic where date='2008-07-16';
select user,dstname,count(*)  from usr join fw on usr.addr = fw.src where group by user,dstname;

Open in new window

SOLUTION
Avatar of virmaior
virmaior
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
ASKER CERTIFIED SOLUTION
Avatar of timbrigham
timbrigham
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Databases
Databases

Databases are organized collections of data, most commonly accessed through management systems including schemas, tables, queries and processes that allow users to enter and manipulate the information or utilize it in other fashions, such as with web applications or for reporting purposes.

62K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo