Solved

Access 2007, complex query

Posted on 2010-09-23
24
455 Views
Last Modified: 2013-11-27
I have a table that consists of 3 columns: state, rate and zip. Each record in this table is unique. Every state has several zip codes and each one of those has a rate assigned to it. The table has about 40,000 records and I need to get it to a much smaller number. I need to combine some of the records and use zip ranges that have the same rate. Example:
State    Zip       Rate
AL       02103   .07
AL       02550   .07
AL       05982   .07

I need to make it look like this:
State    Zip_from     Zip_to      Rate
AL         02103        05982       .07

That would've been easy as I could do group by state and group by zip query and then add 2 columns Min(zip) and Max(zip), BUT there are occasions when we could find in the same state there is another record with the zip code that falls into the same range, but with different rate: to add to example above there could be a record like this:

State    Zip       Rate
AL       03276   .08

It falls within this range:

State    Zip_from     Zip_to      Rate
AL         02103        05982       .07

but it has a different rate. So, these ranges have to be somehow evaluated to see if there  are other records with different zip codes and rates that would fall within the range. If so, the range would have to be stopped and then restarted. So, following the example above we would have 2 ranges instead of 1:

this range:
State    Zip_from     Zip_to      Rate
AL         02103        05982       .07
will become 2 ranges:
State    Zip_from     Zip_to      Rate
AL         02103        02550       .07
AL         05982        05982       .07

That way this record:
State    Zip       Rate
AL       03276   .08
will be in it's own range.

I hope I explained it well enough.
Thanks
0
Comment
Question by:gkhotline
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 6
  • +3
24 Comments
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33744526
Why do you feel the need to make this transition?  

I would advise against it for just the reason you are proposing.  What happens when the rate changes for a particular zip that is in the middle of a range.  You now have to parse that one record into 3.  40,000 records is a small table for a well designed relational database.

Leave it the way it is, it is far easier to use in this format, despite a slight database size decrease.
0
 

Author Comment

by:gkhotline
ID: 33744655
Well, if the rate changes it would just be assigned into already existing range or to a new range of it's own. So it could only create one more record at the maximum. 40,000 records is only for now. The total number may increase significantly and we need to make the operation as quick as possible with the fewest records possible. I have been given this task and it has to be done. I have been trying different things, but cant figure it out.
Thanks
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 33744837
gkhotline,I have to second fyed's recommendation to leave your table as is.  40k records is not a lot in the grand scheme of things, and your current design is exactly what I would have recommended a priori.Think about it: today you might have a valid range of 02103 - 05982, but if tomorrow that range changes you might have to do a heck of a lot of work to catch up.Patrick
0
Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:gkhotline
ID: 33744876
as a followup: i need some sort of a code that would go through every record and for every state where we have same rate - look at the zip codes 1 by 1 and see if any of them go in order (ex. 02110,02111, 02112). If that is the case - put them into a range. So range would only be created if there are consecutive zip codes for a same state and rate. The rest of them would just be individual ranges of 1 record
0
 

Author Comment

by:gkhotline
ID: 33744928
Basically, I would be importing about 70,000 records every month into this tool that I am building and then will run all these queries to try to get it down to a much smaller number. Right now it is down to 40,000 and I need to make it even smaller - that is my requirement. Then I would export this file to be used in a different application. And the structure that it needs to have is:

State   zip_from   zip_to    rate

Any suggestions?
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33750397
So what you are basically saying is that you have no choice, and management will not listen to common sense that says the current structure, when indexed properly will produce excellent results, without the added headache of this unnecessary step.
0
 
LVL 31

Expert Comment

by:Helen_Feddema
ID: 33750439
That does happen, from time to time.  Sometimes, after the warned-about problems occur, they will changes their minds.
0
 
LVL 47

Accepted Solution

by:
Dale Fye (Access MVP) earned 250 total points
ID: 33750484
Given that, I would do something that looks like the following.

   
Public Sub MergeZipRates()

    Dim db As DAO.Database
    Dim rsS As DAO.Recordset, rsD As DAO.Recordset
    Dim strPrevState As String, strPrevZip As String, sngPrevRate As Single
    
    On Error GoTo ProcError
    
    Set db = CurrentDb
    Set rsS = db.OpenRecordset("SELECT * FROM tbl_Zip ORDER BY [State], [Zip]", , dbFailOnError)
    
    db.Execute "DELETE * FROM tbl_ZipNew"
    Set rsD = db.OpenRecordset("SELECT State, Zip_From, Zip_To, Rate FROM tbl_ZipNew", , dbFailOnError)

    While Not rsS.EOF
        If rsS!State = strPrevState And rsS!Rate = sngPrevRate Then
            strPrevZip = rsS!Zip
        Else
            rsD!Zip_To = strPrevZip
            rsD.Update
            rsD.AddNew
            rsD!State = rsS!State
            strPrevState = rsS!State
            rsD!Zip_From = rsS!Zip
            strPrevZip = rsS!Zip
            rsD!Rate = rsS!Rate
            sngPrevRate = rsS!Rate
        End If
        rsS.MoveNext
    Wend
    
ProcExit:
    rsS.Close
    Set rsS = Nothing
    
    rsD!Zip_To = strPrevZip
    rsD.Update
    rsD.Close
    Set rsD = Nothing
        
    Exit Sub
ProcError:
    If Err.Number = 3020 Then   'Update or CancelUpdate without AddNew or Edit.
        Resume Next
    Else
        MsgBox Err.Number & vbCrLf & Err.Description
        Resume ProcExit
    End If
    
End Sub

Open in new window

0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33750498
sample database attached.
ZipRanges.mdb
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 250 total points
ID: 33750782
Can do that in a straight query...



select state, min(zip) as start_zip, max(zip) as end_zip, rate
from (
      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate
     ) as SQ
Group by state, rate, next_rate_break
order by 1,2,3,4

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33750801
Oops, forgot to add the database back, and hope fyed doesnt mind too much that I borrowed it :D
ZipRanges.mdb
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33752389
Mark,

I prefer SQL solutions over code, but I just cannot get my brain to work this early.  I'll have to take a look at this later.  The vba took me about 5 minutes, it would probably have taken me an hour to come up with the SQL.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33752485
G'Day fyed,

For me, it is the other way round :) And didnt mean anything by that comment "it can be done as a straight query" in reference to your code, just that it could also be done as a query. Hope I didnt offend. And yes, it was a bit cheeky to grab your database - ta muchly for posting it though :)
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
ID: 33754109
No offense taken.  The point is to help the OP, and learn a bit at the same time.

I'm pretty good at nested queries, but start to get confused with 3rd order nesting.
0
 

Author Comment

by:gkhotline
ID: 33754624
You guys are good! It will take me some time to understand step by step what you did... How do you want me to split the points? Given the fact fyed gave his solution first... Just want to be fair...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33754749
However you like - it is your question :)  

And please just ask if there is anything I can help with explaining what the heck that query is really doing :) :)
0
 

Author Comment

by:gkhotline
ID: 33754860
Thank you Mark,
Would that be too much to ask to tell me step by step what you are doing? I could possible create a new question for this so that you dont waist your time for nothing.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33754939
No worries, and no need for another question... Explaination is an integral part of the solution :)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33755204
So, first step is understanding what the real problem is. Believe it or not, you actually had the solution, just didnt recognise it as a query...

>> That would've been easy as I could do group by state and group by zip query and then add 2 columns Min(zip) and Max(zip),

followed by :

>> BUT there are occasions when we could find in the same state there is another record...

So, the basis is we can use min and max except when there is a break point. That break point is an intercepting zip code with a different rate. And you provided that information, you knew my query before I did *laughing*.

Step 1 was to then start constructing the "raw" elements, gathering the necessary information so we can run a query.

What it meant in this case was starting with a fairly "raw" query to see if we can get those breakpoints.

      SELECT top 100 percent z.state,z.zip,z.rate,  '???????'  as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate

the challenge then becomes how do we find that breakpoint ? We might be able to join, but in reality, we are really looking for just the very next instance of a zip code with a different rate from the "current" row.

So we introduce an in-line query :

     (select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break

Now it doesnt run on its own because it needs the "current" row as a reference. We also dont have to be hugely concerned where it lives in the overall scheme of things (just yet) so long as we do know that for "this" current series, it will stop at "next_rate_break" if there is one.

Lets now incorporate that back into our query replacing the question marks:

      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate

And running that we see that the detail rows are in fact picking up an instance of "next break point" when there really is one. But the rows look a bit of a mess.

So, now we need to filter out that information, so we wrap up the above inside another query. ie, it becomes a subquery

select state, min(zip) as start_zip, max(zip) as end_zip, rate
from (
      SELECT top 100 percent z.state,z.zip,z.rate,(select top 1 n.zip from tbl_zip n where n.state = z.state and n.zip > z.zip and n.rate <> z.rate order by n.state, n.zip) as next_rate_break
      FROM tbl_Zip as Z
      ORDER BY z.state, z.zip, z.rate
     ) as SQ
Group by state, rate, next_rate_break
order by 1,2,3,4


Notice how we wrap up our supquery in brackets and give it an alias (a name) of "SQ". This is also where Access can be a bit tiresome because it reckons it knows better and ssometimes tries to replace some of our brackets with sqare brackets next time you edit it. Just be careful of that ;)

So, we now have an outer query selecting from our inner or subquery "SQ" and we need to (now) find the appropriate pieces from that raw information to report on.

Back to your original observations. We can use min and max except where there is a breakpoint. Guess what, we now know that there is a breakpoint, and can use that vital piece of information to help the MIN and MAX aggregate functions.

For our outer query, we already know we want to group things by State and rate, and as observed, that might be broken up because of those breakpoints, and so, that also must be included into our grouping...

   select state, min(zip) as start_zip, max(zip) as end_zip, rate
   from (...) as SQ
   Group by state, rate, next_rate_break
   order by 1,2,3,4

Which gives us the group by clause  "   Group by state, rate, next_rate_break " and all we really have to do know is get the state and rate, and then the min(zip) and max(zip) within that breakpoint.

Oh, and to finish it off we also have to sort it in some sequence and I tend to use the "column poistion" type notation rather than column names (seen in the order by)

So, it really becomes a compound select based on facts / information as we can gather...
1) get the "raw" rows (our subquery)
2) add in those breakpoints that will upset the sequence (ie a different rate within that state ) (add an inline query into our subquery)
3) Now select and present that information (the outer query)

Does that help explain it a bit more ?
0
 

Author Comment

by:gkhotline
ID: 33756283
Wow,
I didnt expect such a detailed answer! Thank you very much! I will go through it a little bit later, but it looks like if I can't understand this then all is lost for me.
Again, thank you! I will post my comments later...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 33758299
No problems....

I do apologise for all the typos in there - and didnt explain why I was selecting top 100 percent.

That part is an old habit. In SQL if you use an ORDER BY then you normally need a selection qualifier for that ORDER BY to work within a sub query or view in a lot of different databases (e.g. MS SQL Server). So, when / if using an order by as part of a subquery, then I normally throw in the top 100 percent.

In reality I am not so sure that the order by nor the top 100 percent is actually required in the subquery apart from wanting to make sure that we do have all the postcodes in correct sequence for the state (regardless of when the postcodes were added to the table).

In those type of situations, if you are not sure what the sequence will really be, then always best to take control of it yourself to avoid any unexpected results. In the above, however, the inline query will always get the "next-break-point" regardless of sequence because that too is also using an order by and so could get some performance improvement if we really needed to by testing the need for the "top 100 percent" and "order by" within the "SQ" subquery (it is definitely needed in the inline query).

Also, if you have any "where" qualifiers then you would first look to add them to the "SQ" subquery part to minimise the amount of "raw" data, or, you can add it to the end as well if it is performing OK given the volume of data you have.

See ? Always room for more explanation, so, if it doesnt make sense, simply ask away :)
0
 
LVL 30

Expert Comment

by:hnasr
ID: 33758784
I go with mark_wills's comment ID:33750782. It is easier for me to check. Did not check other comments, whicj may give you the same accepted result.

If you still need more clarification to this comment, I may help.
0
 

Author Closing Comment

by:gkhotline
ID: 33788576
Thank you for your solutions and explanations. Very, very helpful!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s the first day of March, the weather is starting to warm up and the excitement of the upcoming St. Patrick’s Day holiday can be felt throughout the world.
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question