• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 279
  • Last Modified:

sql sum query help on really large table


I have one large table of 150,000 records that I'll get a phone # from.
I have another table that has the phone # and 15 columns that have peg counts in them that i want to sum.  This table has 175 million records.

For example,
phone#       sumPeakMinutes        sum OffPeakMinutes, etc....

I need to get 2 years of data so it is alot of info coming back.

What is the best way to go about this?

1.  select phoneNbr, sum(PeakMinutes), sum(OffPeakMinutes), etc...

2.  Create temp table of 1st table that is the exact same as 1st table, which has phone#.
then create temp table of SummaryData, and it would drop from 175 milion records to 125 million records, then do join between these 2 tables and then do one update statement at a time to get each sum(data) and then spit out results

3.  do something else? what?

I do have indexes on these tables where appropriate, its just taking forever, which i do understand we are dealing with alot of records.
0
purdyra1
Asked:
purdyra1
  • 7
  • 7
  • 5
  • +1
2 Solutions
 
dqmqCommented:
I vote for one query that joins the two tables:

select t2.phoneNbr, sum(t2.PeakMinutes), sum(t2.OffPeakMinutes)
from table1 t1 inner join table2 t2
on t1.phonenbr = t2.phonenmbr
group by t2.phonenbr


0
 
SheilsCommented:
How about creating a summary table with a last update field.

Then every time you open the database you can run a query on table 2 to get the sums since the last update. Then you can add the resultant value to the values in the summary table and store the results as variables. Then run an update query on the summary table. You can include a sum<>0 criteria to limit the updates to phones that has actually been active since the last update.

This way you will only be calculating for activity since the last update which should greatly reduce processing time.
0
 
purdyra1Author Commented:
thx for both responses so far.

@dgmg, using your example it would be more like:

select T1
.phoneNbr, sum(t2.PeakMinutes), sum(t2.OffPeakMinutes)
from table1 t1 inner join table2 t2
on t1.phonenbr = t2.phonenmbr
group by t2.phonenbr

so you are saying, don't even do the temp table method?

@sb9, may I add some info for clarification, this is only a one time report (or so they say) so I won't have to update it, I just need to run it but it is taking forever and ever, so I want to make sure I am doing this the best way even if it just shortens the query time a few minutes or hour(s).

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
SheilsCommented:
Normally I would have gone with dgmd's method. But since you have such a large volume of data you may experience long delay if you are running fresh calculation every time.

However, if it is a one time only report them there is no need for temp or summary summary table. The query will take some time due to the high volume of data.

The other thing I'd like to point out is that I don't see a real need for joining the two tables because you are not using any of the fields in t1. All your fields and groupby is from t2 so why the join to t1?
0
 
Kevin CrossChief Technology OfficerCommented:
I would agree, you may want to see how this performs compared to the other by examining the execution plans before running:

select t1.phoneNbr, t2.PeakMinutes, t2.OffPeakMinutes
from table1 t1 
inner join (
   select phoneNbr
        , sum(PeakMinutes) as PeakMinutes
        , sum(OffPeakMinutes) as OffPeakMinutes
   from table2 
   where {your date filter here}
   group by phoneNbr
) t2 on t1.phonenbr = t2.phonenmbr

Open in new window

0
 
Kevin CrossChief Technology OfficerCommented:
Good point, @sb9!  If you truly are not using t1, then you can just use the one table in query; however, I read that as a simplified version or a possible left join situation and forgot to show that earlier -- that would be where not all phone numbers are in the second table, but you want them to show on the report as 0.

select t1.phoneNbr
     , coalesce(t2.PeakMinutes, 0) as PeakMinutes
     , coalesce(t2.OffPeakMinutes, 0) as OffPeakMinutes
from table1 t1 
left outer join (
   select phoneNbr
        , sum(PeakMinutes) as PeakMinutes
        , sum(OffPeakMinutes) as OffPeakMinutes
   from table2 
   where {your date filter here}
   group by phoneNbr
) t2 on t1.phonenbr = t2.phonenmbr

Open in new window

0
 
purdyra1Author Commented:
@both, thx for the help.

@sb9, in my attempt to keep this simple, I did not say that there is one field that is in table t1 that is not in t2, that is why I need the one field from t1.  besides that everything else is the exact same.

@mwvisa1, I'll take a look at that, thx.
0
 
SheilsCommented:
If the query is taking hours then there may be something else wrong.

Try running there query for a shorter period and see if it what happens.

For example

select T1.phoneNbr, sum(t2.PeakMinutes) As SumPeak, sum(t2.OffPeakMinutes) As SumOffpeak
from table1 t1
group by t2.phonenbr
Where Date(t1.Date)=Date(now())
0
 
Kevin CrossChief Technology OfficerCommented:
Please show us your existing DATE filter.  If it is constructed like sb9's where your column in your table has a function around it, your index on that column is probably not being used.
0
 
dqmqCommented:
>Are you saying don't use the temp table

Yes.  For a one-time report, I don't see how the temp table would speed things up.  

Since your are doing this for a report, it may be faster to take the summing out of the query and let the report do that.  
0
 
SheilsCommented:
A few naive question to help clarify this a bit more for me:

What is peg count?
What are the other 12 field (you said there are 15). Are they used in the query?
Is peakminute and offpeak minute a field in t2 or are there FROM and TO field which are used to derive peak and offpeak minutes?

0
 
purdyra1Author Commented:
@mvvisa1, here it is.   92% goes off of date which has a clustered index on it.

select iLoyCard, iMDN,
            sum(PeakMins), sum(Free_PeakMins), sum(Incl_PeakMins),
            from tblHmby h
      inner join tblUsgeDly ud on h.iWirelessMDN = ud.mdn
where date > '5/3/10'
group by iLoyCard, iMDN
0
 
Kevin CrossChief Technology OfficerCommented:
try:

select iLoyCard, iMDN,
            sum(PeakMins), sum(Free_PeakMins), sum(Incl_PeakMins),
            from tblHmby h
      inner join tblUsgeDly ud on h.iWirelessMDN = ud.mdn
where date > convert(datetime, '2010-05-03')
group by iLoyCard, iMDN

Just for kicks.
0
 
SheilsCommented:
select iLoyCard, iMDN,
            sum(PeakMins), sum(Free_PeakMins), sum(Incl_PeakMins),
            from tblHmby h
      inner join tblUsgeDly ud on h.iWirelessMDN = ud.mdn
where date =Date(Now())
group by iLoyCard, iMDN

This should sum only todays records and give you a real fast response. If the response is still slow then we need to start looking deeper into the underlying structure of the table, datatype etc.....
0
 
purdyra1Author Commented:
@sb9, that does give me a pretty fast response.  5 seconds or so.
0
 
Kevin CrossChief Technology OfficerCommented:
Have you checked the execution plan to see what it is doing?  See if there are any bookmark lookups.  Might need to have your index covering some of the other fields if you are always pulling the same fields based on the indexed column(s).
0
 
SheilsCommented:
WOW. 5 seconds for one day. That translate to 1825 seconds in 365 days. Noting that probably only less than 25% of all the phone numbers where in use today this may come to 7300 second which equates to 2 hours.

Try to run a sum query just on tblUsgeDly and see if things speeds up. Then you could run the join after the sum. Not sure if it will work but worth a try.

EG

Select iMDN,q1.SumPeak,q1.SumOffpeak,q1.SumIncl

From

(select iLoyCard
            sum(PeakMins) SumPeak, sum(Free_PeakMins) SumOffpeak, sum(Incl_PeakMins) SumIncl
          From tblUsgeDly ud on h.iWirelessMDN = ud.mdn
where date =Date(Now())
group by iLoyCard) q1

Inner Join tblHmby h ON h.iLoyCard=q1.h.iLoyCard
0
 
Kevin CrossChief Technology OfficerCommented:
Kind of what I was suggesting here. *smile* http:#a35701663
But seriously, once you have that query constructed -- please grab an estimated execution plan and see if anything jumps out there like any scans or lookups. http:#a35702038

0
 
purdyra1Author Commented:
ok, thanks for all of the help.  In all of my copying and changing, I am wondering if I was searching on the year 2010.  i think i had 5/1/2010 and it was taking 7 minutes but I thought I had 2011, so I was thinking in my head, for 3 days 7 minutes, how long will this take for 2 years which is the data i needed.  

either way, I ran it now and it took about 16 minutes, which is doable.  I am going to load it into a table and then spit it out to excel for the user.  thx for all of the help, i learned a lot.
0
 
Kevin CrossChief Technology OfficerCommented:
Glad it worked out for you and we could help in some way.
Best regards and happy coding,

Kevin
0
 
SheilsCommented:
I am off to work now. Hope all goes well for you.

Reading mwvisa's execution plan it seems that we are suggesting the same thing. Basically you have to run the sum on t2 first. This will reduce the number of joins that you have to create. Instead of joining 150,000 records with 175 million records then running a sum on there multiple (150,000 X 175M) of these records, you only run a sum on 175M record which will give you  150,000 rows. Then join that to t1 which means that you now only have 150,000. Bottom line there will be less joins to create.

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 7
  • 7
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now