Solved

Access 2007, complex query

Posted on 2010-09-23
24
416 Views
Last Modified: 2013-11-27
I have a table that consists of 3 columns: state, rate and zip. Each record in this table is unique. Every state has several zip codes and each one of those has a rate assigned to it. The table has about 40,000 records and I need to get it to a much smaller number. I need to combine some of the records and use zip ranges that have the same rate. Example:
State    Zip       Rate
AL       02103   .07
AL       02550   .07
AL       05982   .07

I need to make it look like this:
State    Zip_from     Zip_to      Rate
AL         02103        05982       .07

That would've been easy as I could do group by state and group by zip query and then add 2 columns Min(zip) and Max(zip), BUT there are occasions when we could find in the same state there is another record with the zip code that falls into the same range, but with different rate: to add to example above there could be a record like this:

State    Zip       Rate
AL       03276   .08

It falls within this range:

State    Zip_from     Zip_to      Rate
AL         02103        05982       .07

but it has a different rate. So, these ranges have to be somehow evaluated to see if there  are other records with different zip codes and rates that would fall within the range. If so, the range would have to be stopped and then restarted. So, following the example above we would have 2 ranges instead of 1:

this range:
State    Zip_from     Zip_to      Rate
AL         02103        05982       .07
will become 2 ranges:
State    Zip_from     Zip_to      Rate
AL         02103        02550       .07
AL         05982        05982       .07

That way this record:
State    Zip       Rate
AL       03276   .08
will be in it's own range.

I hope I explained it well enough.
Thanks
0
Comment
Question by:gkhotline
  • 7
  • 7
  • 6
  • +3
24 Comments
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33744526
Why do you feel the need to make this transition?  

I would advise against it for just the reason you are proposing.  What happens when the rate changes for a particular zip that is in the middle of a range.  You now have to parse that one record into 3.  40,000 records is a small table for a well designed relational database.

Leave it the way it is, it is far easier to use in this format, despite a slight database size decrease.
0
 

Author Comment

by:gkhotline
ID: 33744655
Well, if the rate changes it would just be assigned into already existing range or to a new range of it's own. So it could only create one more record at the maximum. 40,000 records is only for now. The total number may increase significantly and we need to make the operation as quick as possible with the fewest records possible. I have been given this task and it has to be done. I have been trying different things, but cant figure it out.
Thanks
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 33744837
gkhotline,I have to second fyed's recommendation to leave your table as is.  40k records is not a lot in the grand scheme of things, and your current design is exactly what I would have recommended a priori.Think about it: today you might have a valid range of 02103 - 05982, but if tomorrow that range changes you might have to do a heck of a lot of work to catch up.Patrick
0
 

Author Comment

by:gkhotline
ID: 33744876
as a followup: i need some sort of a code that would go through every record and for every state where we have same rate - look at the zip codes 1 by 1 and see if any of them go in order (ex. 02110,02111, 02112). If that is the case - put them into a range. So range would only be created if there are consecutive zip codes for a same state and rate. The rest of them would just be individual ranges of 1 record
0
 

Author Comment

by:gkhotline
ID: 33744928
Basically, I would be importing about 70,000 records every month into this tool that I am building and then will run all these queries to try to get it down to a much smaller number. Right now it is down to 40,000 and I need to make it even smaller - that is my requirement. Then I would export this file to be used in a different application. And the structure that it needs to have is:

State   zip_from   zip_to    rate

Any suggestions?
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33750397
So what you are basically saying is that you have no choice, and management will not listen to common sense that says the current structure, when indexed properly will produce excellent results, without the added headache of this unnecessary step.
0
 
LVL 31

Expert Comment

by:Helen_Feddema
ID: 33750439
That does happen, from time to time.  Sometimes, after the warned-about problems occur, they will changes their minds.
0
 
LVL 47

Accepted Solution

by:
Dale Fye (Access MVP) earned 250 total points
ID: 33750484
Given that, I would do something that looks like the following.

   
Public Sub MergeZipRates()



    Dim db As DAO.Database

    Dim rsS As DAO.Recordset, rsD As DAO.Recordset

    Dim strPrevState As String, strPrevZip As String, sngPrevRate As Single

    

    On Error GoTo ProcError

    

    Set db = CurrentDb

    Set rsS = db.OpenRecordset("SELECT * FROM tbl_Zip ORDER BY [State], [Zip]", , dbFailOnError)

    

    db.Execute "DELETE * FROM tbl_ZipNew"

    Set rsD = db.OpenRecordset("SELECT State, Zip_From, Zip_To, Rate FROM tbl_ZipNew", , dbFailOnError)



    While Not rsS.EOF

        If rsS!State = strPrevState And rsS!Rate = sngPrevRate Then

            strPrevZip = rsS!Zip

        Else

            rsD!Zip_To = strPrevZip

            rsD.Update

            rsD.AddNew

            rsD!State = rsS!State

            strPrevState = rsS!State

            rsD!Zip_From = rsS!Zip

            strPrevZip = rsS!Zip

            rsD!Rate = rsS!Rate

            sngPrevRate = rsS!Rate

        End If

        rsS.MoveNext

    Wend

    

ProcExit:

    rsS.Close

    Set rsS = Nothing

    

    rsD!Zip_To = strPrevZip

    rsD.Update

    rsD.Close

    Set rsD = Nothing

        

    Exit Sub

ProcError:

    If Err.Number = 3020 Then   'Update or CancelUpdate without AddNew or Edit.

        Resume Next

    Else

        MsgBox Err.Number & vbCrLf & Err.Description

        Resume ProcExit

    End If

    

End Sub

Open in new window

0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33750498
sample database attached.
ZipRanges.mdb
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 250 total points
ID: 33750782
Can do that in a straight query...



select state, min(zip) as start_zip, max(zip) as end_zip, rate

from (

      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break

      FROM tbl_Zip as Z

      ORDER BY z.state, z.zip, z.rate

     ) as SQ

Group by state, rate, next_rate_break

order by 1,2,3,4

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33750801
Oops, forgot to add the database back, and hope fyed doesnt mind too much that I borrowed it :D
ZipRanges.mdb
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33752389
Mark,

I prefer SQL solutions over code, but I just cannot get my brain to work this early.  I'll have to take a look at this later.  The vba took me about 5 minutes, it would probably have taken me an hour to come up with the SQL.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33752485
G'Day fyed,

For me, it is the other way round :) And didnt mean anything by that comment "it can be done as a straight query" in reference to your code, just that it could also be done as a query. Hope I didnt offend. And yes, it was a bit cheeky to grab your database - ta muchly for posting it though :)
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33754109
No offense taken.  The point is to help the OP, and learn a bit at the same time.

I'm pretty good at nested queries, but start to get confused with 3rd order nesting.
0
 

Author Comment

by:gkhotline
ID: 33754624
You guys are good! It will take me some time to understand step by step what you did... How do you want me to split the points? Given the fact fyed gave his solution first... Just want to be fair...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33754749
However you like - it is your question :)  

And please just ask if there is anything I can help with explaining what the heck that query is really doing :) :)
0
 

Author Comment

by:gkhotline
ID: 33754860
Thank you Mark,
Would that be too much to ask to tell me step by step what you are doing? I could possible create a new question for this so that you dont waist your time for nothing.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33754939
No worries, and no need for another question... Explaination is an integral part of the solution :)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33755204
So, first step is understanding what the real problem is. Believe it or not, you actually had the solution, just didnt recognise it as a query...

>> That would've been easy as I could do group by state and group by zip query and then add 2 columns Min(zip) and Max(zip),

followed by :

>> BUT there are occasions when we could find in the same state there is another record...

So, the basis is we can use min and max except when there is a break point. That break point is an intercepting zip code with a different rate. And you provided that information, you knew my query before I did *laughing*.

Step 1 was to then start constructing the "raw" elements, gathering the necessary information so we can run a query.

What it meant in this case was starting with a fairly "raw" query to see if we can get those breakpoints.

      SELECT top 100 percent z.state,z.zip,z.rate,  '???????'  as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate

the challenge then becomes how do we find that breakpoint ? We might be able to join, but in reality, we are really looking for just the very next instance of a zip code with a different rate from the "current" row.

So we introduce an in-line query :

     (select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break

Now it doesnt run on its own because it needs the "current" row as a reference. We also dont have to be hugely concerned where it lives in the overall scheme of things (just yet) so long as we do know that for "this" current series, it will stop at "next_rate_break" if there is one.

Lets now incorporate that back into our query replacing the question marks:

      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate

And running that we see that the detail rows are in fact picking up an instance of "next break point" when there really is one. But the rows look a bit of a mess.

So, now we need to filter out that information, so we wrap up the above inside another query. ie, it becomes a subquery

select state, min(zip) as start_zip, max(zip) as end_zip, rate
from (
      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate
     ) as SQ
Group by state, rate, next_rate_break
order by 1,2,3,4


Notice how we wrap up our supquery in brackets and give it an alias (a name) of "SQ". This is also where Access can be a bit tiresome because it reckons it knows better and ssometimes tries to replace some of our brackets with sqare brackets next time you edit it. Just be careful of that ;)

So, we now have an outer query selecting from our inner or subquery "SQ" and we need to (now) find the appropriate pieces from that raw information to report on.

Back to your original observations. We can use min and max except where there is a breakpoint. Guess what, we now know that there is a breakpoint, and can use that vital piece of information to help the MIN and MAX aggregate functions.

For our outer query, we already know we want to group things by State and rate, and as observed, that might be broken up because of those breakpoints, and so, that also must be included into our grouping...

   select state, min(zip) as start_zip, max(zip) as end_zip, rate
   from (...) as SQ
   Group by state, rate, next_rate_break
   order by 1,2,3,4

Which gives us the group by clause  "   Group by state, rate, next_rate_break " and all we really have to do know is get the state and rate, and then the min(zip) and max(zip) within that breakpoint.

Oh, and to finish it off we also have to sort it in some sequence and I tend to use the "column poistion" type notation rather than column names (seen in the order by)

So, it really becomes a compound select based on facts / information as we can gather...
1) get the "raw" rows (our subquery)
2) add in those breakpoints that will upset the sequence (ie a different rate within that state ) (add an inline query into our subquery)
3) Now select and present that information (the outer query)

Does that help explain it a bit more ?
0
 

Author Comment

by:gkhotline
ID: 33756283
Wow,
I didnt expect such a detailed answer! Thank you very much! I will go through it a little bit later, but it looks like if I can't understand this then all is lost for me.
Again, thank you! I will post my comments later...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33758299
No problems....

I do apologise for all the typos in there - and didnt explain why I was selecting top 100 percent.

That part is an old habit. In SQL if you use an ORDER BY then you normally need a selection qualifier for that ORDER BY to work within a sub query or view in a lot of different databases (e.g. MS SQL Server). So, when / if using an order by as part of a subquery, then I normally throw in the top 100 percent.

In reality I am not so sure that the order by nor the top 100 percent is actually required in the subquery apart from wanting to make sure that we do have all the postcodes in correct sequence for the state (regardless of when the postcodes were added to the table).

In those type of situations, if you are not sure what the sequence will really be, then always best to take control of it yourself to avoid any unexpected results. In the above, however, the inline query will always get the "next-break-point" regardless of sequence because that too is also using an order by and so could get some performance improvement if we really needed to by testing the need for the "top 100 percent" and "order by" within the "SQ" subquery (it is definitely needed in the inline query).

Also, if you have any "where" qualifiers then you would first look to add them to the "SQ" subquery part to minimise the amount of "raw" data, or, you can add it to the end as well if it is performing OK given the volume of data you have.

See ? Always room for more explanation, so, if it doesnt make sense, simply ask away :)
0
 
LVL 30

Expert Comment

by:hnasr
ID: 33758784
I go with mark_wills's comment ID:33750782. It is easier for me to check. Did not check other comments, whicj may give you the same accepted result.

If you still need more clarification to this comment, I may help.
0
 

Author Closing Comment

by:gkhotline
ID: 33788576
Thank you for your solutions and explanations. Very, very helpful!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

In a multiple monitor setup, if you don't want to use AutoCenter to position your popup forms, you have a problem: where will they appear?  Sometimes you may have an additional problem: where the devil did they go?  If you last had a popup form open…
Workbook link problems after copying tabs to a new workbook? David Miller (dlmille) Intro Have you either copied sheets to a new workbook, and after having saved and opened that workbook, you find that there are links back to the original sou…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now