<

Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x

Beware of Between

Published on
77,894 Points
6,694 Views
22 Endorsements
Last Modified:
Awarded
PortletPaul
More years at various grindstones than I care to remember. Some interesting grindstones though: Shipbuilding, Aerospace ..... IT
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common with most definitions is that BETWEEN equates to:

[expression] >= [low boundary] and [expression] <= [high boundary]

Note there are 2 equal signs! This means the [low boundary] and [high boundary] are BOTH included. Also note there is a definite order, [low boundary] must be given first or there will not be any matches at all.

This 'inclusive' definition may not agree with your generally held interpretation of "between". For example if I drive my car between 2 trees I don't want to include the trees (else I have tree damage and/or car damage). But if I catch a jet between New York and London, both cities are included (to catch the jet and to land the jet). So a "car between trees" is different to a "jet between cities"; in SQL 'between' is used in the 'inclusive' sense of "jet between cities".

Below we examine BETWEEN in more depth using a set of queries, and of course we need some data for this:sample data, 10 rows, "raw_data" = 10.1, Category  is 1 char (A to Z) and some datesThis first query is deliberately 'odd', it asks for category between ‘A’ and ‘A’ and this may not seem logical (if you hang on to a "car between trees" viewpoint). Let’s test it:
select
id, category, raw_data
from sample
where 
   category BETWEEN 'A' AND 'A' /* note! [value1] AND [value1] */
OR category BETWEEN 'L' AND 'L' /* note! [value1] AND [value1] */
OR category BETWEEN 'Z' AND 'Z' /* note! [value1] AND [value1] */

Open in new window

A between A and A? L between L and L? Z between Z and Z?So, proof that ‘A’ is BETWEEN ‘A’ AND ‘A’ (ditto L and Z). What we really see here is the effect of EQUALS in the SQL definition of BETWEEN. ‘A’ does equal ‘A’ (and does equal ‘A’). So; keep in mind that BETWEEN includes both boundaries!

What about dates/times/numbers? Will the same type of effect happen? Let’s try a similar silly query on a date field:
select
id, sold_date, raw_data
from sample
where 
   sold_date BETWEEN to_date('2013/01/01','YYYY/MM/DD') 
                 AND to_date('2013/01/01','YYYY/MM/DD') 
OR sold_date BETWEEN to_date('2013/02/01','YYYY/MM/DD') 
                 AND to_date('2013/02/01','YYYY/MM/DD')
OR sold_date BETWEEN to_date('2013/03/01','YYYY/MM/DD') 
                 AND to_date('2013/03/01','YYYY/MM/DD')

Open in new window

Jan 1 between Jan 1 and Jan1?  etc.Yes! BETWEEN is consistent, if there are exact matches to the provided boundaries of dates/times/numbers they will be included.

So, what is the problem with this? You need to beware of overlapping! Let’s investigate this a little. The next query uses some case expressions to move raw data into columns using BETWEEN to mimic a simple report, with the 'RSLT' column indicating good or bad things.
select
id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
, case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , category
        , raw_data
        , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
        , case when category between 'L' and 'Z' then raw_data else 0 end as btwn_Z
        from sample
     ) inq
order by category

Open in new window

Overlapping 'buckets' create calculation errorWhat we see in this result is the ‘L’ is matched by two BETWEEN expressions and now there is calculation error because we would be double-counting any matches to ‘L’.

Luckily this is easily solved! Make the BETWEEN expressions “mutually exclusive”. So instead of using ‘L’ twice, we move to the next letter ‘M’ and the expressions are now mutually exclusive.
select
id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
, case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , category
        , raw_data
        , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
        , case when category between 'M' and 'Z' then raw_data else 0 end as btwn_Z /* correction to 'M' in this line */
        from sample
     ) inq
order by category

Open in new window

Mutually Exclusive 'buckets' give correct resultsSo, mutually exclusive ‘buckets’ are a good thing. Note (not shown) if we happened to use ‘N’ instead of ‘M’ we would create a gap, so not only should the buckets be mutually exclusive, it is equally important that they are don’t allow gaps to occur.

Do the same sorts of problems occur with dates? ABSOLUTELY!
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping 'buckets' can occur with Date boundariesWe saw before that by making the ‘buckets’ mutually exclusive we solve this overlapping problem. But how is this done with dates? There are 2 methods for this:
## Deducting an interval from the [high_boundary]
## Not using BETWEEN ... AND

1. Deducting an Interval from the [high_boundary]

This is a commonly used method – but, there are potential difficulties that are discussed later.  The next query deducts one day from each of the [high_boundaries]
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping Solved by deducting one (day)Excellent! No overlapping and no calculation errors.

2. Not Using BETWEEN ... AND

This may seem an extreme option, but really it isn't. Going back to the definition of BETWEEN ... AND we know this equates to:
[expression] >= [low_boundary] AND [expression] <= [high_boundary]

If you recall the previous approach we succeeded by deducting one from the [high_boundary] and it then becomes less than the next [low_boundary]; well the very same effect would be to use the following conditions instead:
[expression] >= [low_boundary] AND [expression] < [high_boundary]

So, lets’ run a query without using BETWEEN:
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
        , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
        , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping Solved by using Less Than on High BoundaryExcellent again! No overlapping and no calculation error (again).

So, which method is better? The answer is “it depends” often by preference, attitude, or standards. My own view is, when using date/time information, it is often safer and more reliable to use the LESS THAN technique (not using BETWEEN).

Why? Well let’s, introduce one new record into our sample data. This new record (Id=11) has time information in the sold_date field.
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

But, Deduct One Day can produce Gaps! On No!Oh No! A new type of error; Now there is a record that would be missed in calculations (a GAP in the ‘buckets’).

What caused this AND, how do I fix this? The cause is the “minus one”, when we deduct 1 from a date we are deducting “a whole day” so any time from 00:00:00 to 23:59:59:999999999999(recurring) of a day is now missing from our [high_boundary].

So what would we deduct instead? Well this is where it gets potentially tricky. If the data was only whole dates -1 is OK, we saw that before time was introduced into the data. If the data involves time, to the precision of 1 second, then we can deduct 1 second. If the time is more precise, say a millisecond, then we can deduct 1 millisecond. In other words you need to be aware of the data precision before choosing the right interval to deduct.

Before you start looking through your data type documentation let's review the next query covering the newly introduced record which does not use BETWEEN ... AND:
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
from (
    select
    id
    , sold_date
    , category
    , raw_data
    , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
    , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
    , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
    from sample
     ) inq
order by inq.sold_date

Open in new window

No Gaps by Using Less Than on High Boundary (no 'between' in query')Look Ma, no hands! No query modification needed and calculations are correct (as no gap exists).

So. Let’s recap.
BETWEEN ... AND is defined to INCLUDE both boundaries.
We saw that when using a string field (category) we could solve issues by making the ‘buckets’ mutually exclusive.
Then there are 2 techniques with date/time related information to make the buckets mutually exclusive:
Deduct an interval from the high_boundary, but we also saw that we must choose the correct interval precision.
OR: Don’t use BETWEEN ... AND, instead use
[expression] >= [low_boundary] AND [expression] < [high_boundary]
, plus we saw that in this approach we don’t have to choose an interval precision.

Ultimately it is your choice which technique to use: but hopefully you understand the need for choosing one of these techniques.

POSTSCRIPT

It is sometimes said that using BETWEEN ... AND makes for cleaner/neater more comprehensible code. So let’s go for just one more query to test this. As discussed above if we deduct the correct unit we can continue using BETWEEN ... AND. Now with this in mind let’s deduct one second instead of one day. Oh, there are options here too but I’m going with a traditional approach; 1/(24*60*60) equates to 1 second. So;
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) _' end as rslt
from (
    select
    id
    , sold_date
    , category
    , raw_data
    , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                              and to_date('2013/02/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_1
    , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                              and to_date('2013/03/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_2
    , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                              and to_date('2013/04/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_3
    from sample
     ) inq
order by inq.sold_date

Open in new window

No Gaps by Deducting Correct Unit of Precision (eg. 1 second)Voila! The right results by deducting the right interval!

Now we can compare 2 functionally equivalent snippets of code:You decide: Which is 'cleaner'? (sample only! yes, there are many alternatives!)Do you see cleaner code (either way)? To me it’s either a tie, or just maybe it swings of favour of  the using LESS THAN?

CONCLUSION

For date/time related information regardless of data type, when using BETWEEN there is potential for ‘buckets’ to overlap and calculation errors to occur due to this. Take care to ensure that your ‘buckets’ are mutually exclusive either by adjusting one of the boundaries (usually the high) – OR – adopt the alternative which is to specifically use [expression] >= [low boundary] and [expression] < [high boundary] in your query code (not using BETWEEN).

Also: Don’t forget to avoid gaps. Particularly gaps you might introduce by deducting an inappropriately large time interval if using that technique.

REFERENCES

The intended focus of this article is "between" in Oracle/MSSQL, but it is applicable also to mySQL/Access

between @Oracle9i Lite SQL Reference
between (Oracle 9i) Cost Based Optimizer
between mySQL 5.0: expr between min and max
between (Transact-SQL) [2005]
between...and Operator Access 2003
between @w3schools.com (this indicates that some databases use a different definition for between)

between @Merriam-Webster @wiktionary.org
Beware-of-Between.ORA.sql
Beware-of-Between.MSSQL.sql
22
Comment
Author:PortletPaul
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
12 Comments
 
LVL 51

Expert Comment

by:Keith Alabaster
Nice article :)
0
 
LVL 111

Expert Comment

by:Ray Paseur
Great article!
0
 
LVL 60

Expert Comment

by:Kevin Cross
Welcome! Very nice first month here at EE. Glad you decided to share your knowledge with the community. You have my YES (+1) vote!
0
Application Discovery Service in AWS

In the era of the cloud, customers migrating away from their existing on-premise infrastructure. This requires lots of planning, strategies, and effort to identify their existing resources and determine how best to migrate.  Datacenter migrations happen in four phases -

 
LVL 49

Author Comment

by:PortletPaul
Thanks folks - it's  a learning experience at me end as well. Lots of exposure to issue types I don't get in my day job. Cheers.
0
 
LVL 23

Expert Comment

by:Ioannis Paraskevopoulos
That is a nice article and changed the way i am writing queries. I used to like the BETWEEN statement because it was more straightforward, and i used to frown upon queries that used the gt - lt syntax, as they were supposed to be the same functionality but uglier.

Well done.
0
 
LVL 49

Author Comment

by:PortletPaul
Giannis,

Thank you! High praise indeed from someone I implicitly trust and respect. I am certain you will find avoiding "between ... and" will aid in more predictable and precise queries.

Cheers & regards,
Paul
0
 
LVL 37

Expert Comment

by:ValentinoV
Congrats with your article!  Just gave you a Yes vote which brings the total to 10 so you should be seeing your first Community Pick soon! :)

Ow, and welcome to EE!
0
 
LVL 49

Author Comment

by:PortletPaul
Thank you! & its a real pleasure to be here.

Regards, Paul
0
 
LVL 66

Expert Comment

by:Jim Horn
Very good read.  Voted yes.

>If the data involves time, to the precision of 1 second, then we can deduct 1 second.
I've chased down many a prod support ticket where developers failed to realize that 1-1-2013 translates to 1-1-2013 00:00:00, and wouldn't include 1-1-2013 12:34:56.
0
 
LVL 49

Author Comment

by:PortletPaul
:) Jim, thanks. Like you, I have wasted hours fixing such issues and have seen millions of $ "disappear" because some report writers actually believe between is dedicated for use with dates and don't appreciate that the duration of days can be excluded. I just keep plugging away in the hope the message gets through.
0
 
LVL 52

Expert Comment

by:Vitor Montalvão
Excellent article. Gave my vote of course.
Cheers
0
 
LVL 26

Expert Comment

by:Nick67
There's also the fun and hellacious problem of how dates are displayed in Access.
You can make a field up as DateTime and format it as "dd-mmm-yyyy"
It all looks good.  But it is stored as a Double with a Long integer value based on the number of days elapsed since 30-Dec-1899 and the decimal as a fraction of the day as the time.
Now, add some values to it using Date() and others using Now() and watch the fun ensue as you use BETWEEN...AND and get some records in at the  'boundary' and others out.
Those entered via Date() will be dd-mmm-yyyy 12:00:00 AM and will be in.
Those enter via Now() will be dd-mmm-yyyy and some time component and be out.

That trips up a lot of folks!
0

Featured Post

Stack Overflow Podcast - Developer Story

Welcome to the Stack Overflow podcast recorded Thursday July 20 at Stack Overflow Headquearters in NYC. Your hosts today are podcast regulars Jay Hanlon, David Fullerton, and Ilana Yitzhaki, plus the quite irregular Matt Sherman (Stack Overflow Engineering Manager extraordinaire)

Join & Write a Comment

Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month