Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Beware of Between

PortletPaulEE Topic Advisor
CERTIFIED EXPERT
More years at various grindstones than I care to remember. Some interesting grindstones though: Shipbuilding, Aerospace ..... IT
Published:
Updated:
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common with most definitions is that BETWEEN equates to:

[expression] >= [low boundary] and [expression] <= [high boundary]

Note there are 2 equal signs! This means the [low boundary] and [high boundary] are BOTH included. Also note there is a definite order, [low boundary] must be given first or there will not be any matches at all.

This 'inclusive' definition may not agree with your generally held interpretation of "between". For example if I drive my car between 2 trees I don't want to include the trees (else I have tree damage and/or car damage). But if I catch a jet between New York and London, both cities are included (to catch the jet and to land the jet). So a "car between trees" is different to a "jet between cities"; in SQL 'between' is used in the 'inclusive' sense of "jet between cities".

Below we examine BETWEEN in more depth using a set of queries, and of course we need some data for this:sample data, 10 rows, "raw_data" = 10.1, Category  is 1 char (A to Z) and some datesThis first query is deliberately 'odd', it asks for category between ‘A’ and ‘A’ and this may not seem logical (if you hang on to a "car between trees" viewpoint). Let’s test it:
select
                      id, category, raw_data
                      from sample
                      where 
                         category BETWEEN 'A' AND 'A' /* note! [value1] AND [value1] */
                      OR category BETWEEN 'L' AND 'L' /* note! [value1] AND [value1] */
                      OR category BETWEEN 'Z' AND 'Z' /* note! [value1] AND [value1] */

Open in new window

A between A and A? L between L and L? Z between Z and Z?So, proof that ‘A’ is BETWEEN ‘A’ AND ‘A’ (ditto L and Z). What we really see here is the effect of EQUALS in the SQL definition of BETWEEN. ‘A’ does equal ‘A’ (and does equal ‘A’). So; keep in mind that BETWEEN includes both boundaries!

What about dates/times/numbers? Will the same type of effect happen? Let’s try a similar silly query on a date field:
select
                      id, sold_date, raw_data
                      from sample
                      where 
                         sold_date BETWEEN to_date('2013/01/01','YYYY/MM/DD') 
                                       AND to_date('2013/01/01','YYYY/MM/DD') 
                      OR sold_date BETWEEN to_date('2013/02/01','YYYY/MM/DD') 
                                       AND to_date('2013/02/01','YYYY/MM/DD')
                      OR sold_date BETWEEN to_date('2013/03/01','YYYY/MM/DD') 
                                       AND to_date('2013/03/01','YYYY/MM/DD')

Open in new window

Jan 1 between Jan 1 and Jan1?  etc.Yes! BETWEEN is consistent, if there are exact matches to the provided boundaries of dates/times/numbers they will be included.

So, what is the problem with this? You need to beware of overlapping! Let’s investigate this a little. The next query uses some case expressions to move raw data into columns using BETWEEN to mimic a simple report, with the 'RSLT' column indicating good or bad things.
select
                      id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
                      , case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
                      from (
                              select
                              id
                              , category
                              , raw_data
                              , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
                              , case when category between 'L' and 'Z' then raw_data else 0 end as btwn_Z
                              from sample
                           ) inq
                      order by category

Open in new window

Overlapping 'buckets' create calculation errorWhat we see in this result is the ‘L’ is matched by two BETWEEN expressions and now there is calculation error because we would be double-counting any matches to ‘L’.

Luckily this is easily solved! Make the BETWEEN expressions “mutually exclusive”. So instead of using ‘L’ twice, we move to the next letter ‘M’ and the expressions are now mutually exclusive.
select
                      id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
                      , case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
                      from (
                              select
                              id
                              , category
                              , raw_data
                              , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
                              , case when category between 'M' and 'Z' then raw_data else 0 end as btwn_Z /* correction to 'M' in this line */
                              from sample
                           ) inq
                      order by category

Open in new window

Mutually Exclusive 'buckets' give correct resultsSo, mutually exclusive ‘buckets’ are a good thing. Note (not shown) if we happened to use ‘N’ instead of ‘M’ we would create a gap, so not only should the buckets be mutually exclusive, it is equally important that they are don’t allow gaps to occur.

Do the same sorts of problems occur with dates? ABSOLUTELY!
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '       :)' end as rslt
                      from (
                              select
                              id
                              , sold_date
                              , category
                              , raw_data
                              , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                                        and to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
                              , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                                        and to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
                              , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                                        and to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
                              from sample
                           ) inq
                      order by inq.sold_date

Open in new window

Overlapping 'buckets' can occur with Date boundariesWe saw before that by making the ‘buckets’ mutually exclusive we solve this overlapping problem. But how is this done with dates? There are 2 methods for this:
## Deducting an interval from the [high_boundary]
## Not using BETWEEN ... AND

1. Deducting an Interval from the [high_boundary]

This is a commonly used method – but, there are potential difficulties that are discussed later.  The next query deducts one day from each of the [high_boundaries]
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
                      from (
                              select
                              id
                              , sold_date
                              , category
                              , raw_data
                              , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                                        and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
                              , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                                        and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
                              , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                                        and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
                              from sample
                           ) inq
                      order by inq.sold_date

Open in new window

Overlapping Solved by deducting one (day)Excellent! No overlapping and no calculation errors.

2. Not Using BETWEEN ... AND

This may seem an extreme option, but really it isn't. Going back to the definition of BETWEEN ... AND we know this equates to:
[expression] >= [low_boundary] AND [expression] <= [high_boundary]

If you recall the previous approach we succeeded by deducting one from the [high_boundary] and it then becomes less than the next [low_boundary]; well the very same effect would be to use the following conditions instead:
[expression] >= [low_boundary] AND [expression] < [high_boundary]

So, lets’ run a query without using BETWEEN:
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
                      from (
                              select
                              id
                              , sold_date
                              , category
                              , raw_data
                              , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
                                      and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
                              , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
                                      and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
                              , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
                                      and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
                              from sample
                           ) inq
                      order by inq.sold_date

Open in new window

Overlapping Solved by using Less Than on High BoundaryExcellent again! No overlapping and no calculation error (again).

So, which method is better? The answer is “it depends” often by preference, attitude, or standards. My own view is, when using date/time information, it is often safer and more reliable to use the LESS THAN technique (not using BETWEEN).

Why? Well let’s, introduce one new record into our sample data. This new record (Id=11) has time information in the sold_date field.
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
                      from (
                              select
                              id
                              , sold_date
                              , category
                              , raw_data
                              , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                                        and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
                              , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                                        and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
                              , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                                        and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
                              from sample
                           ) inq
                      order by inq.sold_date

Open in new window

But, Deduct One Day can produce Gaps! On No!Oh No! A new type of error; Now there is a record that would be missed in calculations (a GAP in the ‘buckets’).

What caused this AND, how do I fix this? The cause is the “minus one”, when we deduct 1 from a date we are deducting “a whole day” so any time from 00:00:00 to 23:59:59:999999999999(recurring) of a day is now missing from our [high_boundary].

So what would we deduct instead? Well this is where it gets potentially tricky. If the data was only whole dates -1 is OK, we saw that before time was introduced into the data. If the data involves time, to the precision of 1 second, then we can deduct 1 second. If the time is more precise, say a millisecond, then we can deduct 1 millisecond. In other words you need to be aware of the data precision before choosing the right interval to deduct.

Before you start looking through your data type documentation let's review the next query covering the newly introduced record which does not use BETWEEN ... AND:
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
                      from (
                          select
                          id
                          , sold_date
                          , category
                          , raw_data
                          , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
                                  and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
                          , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
                                  and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
                          , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
                                  and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
                          from sample
                           ) inq
                      order by inq.sold_date

Open in new window

No Gaps by Using Less Than on High Boundary (no 'between' in query')Look Ma, no hands! No query modification needed and calculations are correct (as no gap exists).

So. Let’s recap.
BETWEEN ... AND is defined to INCLUDE both boundaries.
We saw that when using a string field (category) we could solve issues by making the ‘buckets’ mutually exclusive.
Then there are 2 techniques with date/time related information to make the buckets mutually exclusive:
Deduct an interval from the high_boundary, but we also saw that we must choose the correct interval precision.
OR: Don’t use BETWEEN ... AND, instead use
[expression] >= [low_boundary] AND [expression] < [high_boundary]
, plus we saw that in this approach we don’t have to choose an interval precision.

Ultimately it is your choice which technique to use: but hopefully you understand the need for choosing one of these techniques.

POSTSCRIPT

It is sometimes said that using BETWEEN ... AND makes for cleaner/neater more comprehensible code. So let’s go for just one more query to test this. As discussed above if we deduct the correct unit we can continue using BETWEEN ... AND. Now with this in mind let’s deduct one second instead of one day. Oh, there are options here too but I’m going with a traditional approach; 1/(24*60*60) equates to 1 second. So;
select
                      inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
                      , case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) _' end as rslt
                      from (
                          select
                          id
                          , sold_date
                          , category
                          , raw_data
                          , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                                    and to_date('2013/02/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_1
                          , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                                    and to_date('2013/03/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_2
                          , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                                    and to_date('2013/04/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_3
                          from sample
                           ) inq
                      order by inq.sold_date

Open in new window

No Gaps by Deducting Correct Unit of Precision (eg. 1 second)Voila! The right results by deducting the right interval!

Now we can compare 2 functionally equivalent snippets of code:You decide: Which is 'cleaner'? (sample only! yes, there are many alternatives!)Do you see cleaner code (either way)? To me it’s either a tie, or just maybe it swings of favour of  the using LESS THAN?

CONCLUSION

For date/time related information regardless of data type, when using BETWEEN there is potential for ‘buckets’ to overlap and calculation errors to occur due to this. Take care to ensure that your ‘buckets’ are mutually exclusive either by adjusting one of the boundaries (usually the high) – OR – adopt the alternative which is to specifically use [expression] >= [low boundary] and [expression] < [high boundary] in your query code (not using BETWEEN).

Also: Don’t forget to avoid gaps. Particularly gaps you might introduce by deducting an inappropriately large time interval if using that technique.

REFERENCES

The intended focus of this article is "between" in Oracle/MSSQL, but it is applicable also to mySQL/Access

between @Oracle9i Lite SQL Reference
between (Oracle 9i) Cost Based Optimizer
between mySQL 5.0: expr between min and max
between (Transact-SQL) [2005]
between...and Operator Access 2003
between @w3schools.com (this indicates that some databases use a different definition for between)

between @Merriam-Webster @wiktionary.org
Beware-of-Between.ORA.sql
Beware-of-Between.MSSQL.sql
22
11,003 Views
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
More years at various grindstones than I care to remember. Some interesting grindstones though: Shipbuilding, Aerospace ..... IT

Comments (12)

PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Author

Commented:
Thank you! & its a real pleasure to be here.

Regards, Paul
Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Very good read.  Voted yes.

>If the data involves time, to the precision of 1 second, then we can deduct 1 second.
I've chased down many a prod support ticket where developers failed to realize that 1-1-2013 translates to 1-1-2013 00:00:00, and wouldn't include 1-1-2013 12:34:56.
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Author

Commented:
:) Jim, thanks. Like you, I have wasted hours fixing such issues and have seen millions of $ "disappear" because some report writers actually believe between is dedicated for use with dates and don't appreciate that the duration of days can be excluded. I just keep plugging away in the hope the message gets through.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Excellent article. Gave my vote of course.
Cheers
CERTIFIED EXPERT
Most Valuable Expert 2014

Commented:
There's also the fun and hellacious problem of how dates are displayed in Access.
You can make a field up as DateTime and format it as "dd-mmm-yyyy"
It all looks good.  But it is stored as a Double with a Long integer value based on the number of days elapsed since 30-Dec-1899 and the decimal as a fraction of the day as the time.
Now, add some values to it using Date() and others using Now() and watch the fun ensue as you use BETWEEN...AND and get some records in at the  'boundary' and others out.
Those entered via Date() will be dd-mmm-yyyy 12:00:00 AM and will be in.
Those enter via Now() will be dd-mmm-yyyy and some time component and be out.

That trips up a lot of folks!

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.