<

Go Premium for a chance to win a PS4. Enter to Win

x

Beware of Between

Published on
78,090 Points
6,890 Views
22 Endorsements
Last Modified:
Awarded
PortletPaul
More years at various grindstones than I care to remember. Some interesting grindstones though: Shipbuilding, Aerospace ..... IT
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common with most definitions is that BETWEEN equates to:

[expression] >= [low boundary] and [expression] <= [high boundary]

Note there are 2 equal signs! This means the [low boundary] and [high boundary] are BOTH included. Also note there is a definite order, [low boundary] must be given first or there will not be any matches at all.

This 'inclusive' definition may not agree with your generally held interpretation of "between". For example if I drive my car between 2 trees I don't want to include the trees (else I have tree damage and/or car damage). But if I catch a jet between New York and London, both cities are included (to catch the jet and to land the jet). So a "car between trees" is different to a "jet between cities"; in SQL 'between' is used in the 'inclusive' sense of "jet between cities".

Below we examine BETWEEN in more depth using a set of queries, and of course we need some data for this:sample data, 10 rows, "raw_data" = 10.1, Category  is 1 char (A to Z) and some datesThis first query is deliberately 'odd', it asks for category between ‘A’ and ‘A’ and this may not seem logical (if you hang on to a "car between trees" viewpoint). Let’s test it:
select
id, category, raw_data
from sample
where 
   category BETWEEN 'A' AND 'A' /* note! [value1] AND [value1] */
OR category BETWEEN 'L' AND 'L' /* note! [value1] AND [value1] */
OR category BETWEEN 'Z' AND 'Z' /* note! [value1] AND [value1] */

Open in new window

A between A and A? L between L and L? Z between Z and Z?So, proof that ‘A’ is BETWEEN ‘A’ AND ‘A’ (ditto L and Z). What we really see here is the effect of EQUALS in the SQL definition of BETWEEN. ‘A’ does equal ‘A’ (and does equal ‘A’). So; keep in mind that BETWEEN includes both boundaries!

What about dates/times/numbers? Will the same type of effect happen? Let’s try a similar silly query on a date field:
select
id, sold_date, raw_data
from sample
where 
   sold_date BETWEEN to_date('2013/01/01','YYYY/MM/DD') 
                 AND to_date('2013/01/01','YYYY/MM/DD') 
OR sold_date BETWEEN to_date('2013/02/01','YYYY/MM/DD') 
                 AND to_date('2013/02/01','YYYY/MM/DD')
OR sold_date BETWEEN to_date('2013/03/01','YYYY/MM/DD') 
                 AND to_date('2013/03/01','YYYY/MM/DD')

Open in new window

Jan 1 between Jan 1 and Jan1?  etc.Yes! BETWEEN is consistent, if there are exact matches to the provided boundaries of dates/times/numbers they will be included.

So, what is the problem with this? You need to beware of overlapping! Let’s investigate this a little. The next query uses some case expressions to move raw data into columns using BETWEEN to mimic a simple report, with the 'RSLT' column indicating good or bad things.
select
id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
, case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , category
        , raw_data
        , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
        , case when category between 'L' and 'Z' then raw_data else 0 end as btwn_Z
        from sample
     ) inq
order by category

Open in new window

Overlapping 'buckets' create calculation errorWhat we see in this result is the ‘L’ is matched by two BETWEEN expressions and now there is calculation error because we would be double-counting any matches to ‘L’.

Luckily this is easily solved! Make the BETWEEN expressions “mutually exclusive”. So instead of using ‘L’ twice, we move to the next letter ‘M’ and the expressions are now mutually exclusive.
select
id, category, raw_data, btwn_A, btwn_Z, btwn_A + btwn_Z as total
, case when (btwn_A + btwn_Z) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , category
        , raw_data
        , case when category between 'A' and 'L' then raw_data else 0 end as btwn_A
        , case when category between 'M' and 'Z' then raw_data else 0 end as btwn_Z /* correction to 'M' in this line */
        from sample
     ) inq
order by category

Open in new window

Mutually Exclusive 'buckets' give correct resultsSo, mutually exclusive ‘buckets’ are a good thing. Note (not shown) if we happened to use ‘N’ instead of ‘M’ we would create a gap, so not only should the buckets be mutually exclusive, it is equally important that they are don’t allow gaps to occur.

Do the same sorts of problems occur with dates? ABSOLUTELY!
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '       :)' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping 'buckets' can occur with Date boundariesWe saw before that by making the ‘buckets’ mutually exclusive we solve this overlapping problem. But how is this done with dates? There are 2 methods for this:
## Deducting an interval from the [high_boundary]
## Not using BETWEEN ... AND

1. Deducting an Interval from the [high_boundary]

This is a commonly used method – but, there are potential difficulties that are discussed later.  The next query deducts one day from each of the [high_boundaries]
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping Solved by deducting one (day)Excellent! No overlapping and no calculation errors.

2. Not Using BETWEEN ... AND

This may seem an extreme option, but really it isn't. Going back to the definition of BETWEEN ... AND we know this equates to:
[expression] >= [low_boundary] AND [expression] <= [high_boundary]

If you recall the previous approach we succeeded by deducting one from the [high_boundary] and it then becomes less than the next [low_boundary]; well the very same effect would be to use the following conditions instead:
[expression] >= [low_boundary] AND [expression] < [high_boundary]

So, lets’ run a query without using BETWEEN:
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
        , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
        , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
                and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

Overlapping Solved by using Less Than on High BoundaryExcellent again! No overlapping and no calculation error (again).

So, which method is better? The answer is “it depends” often by preference, attitude, or standards. My own view is, when using date/time information, it is often safer and more reliable to use the LESS THAN technique (not using BETWEEN).

Why? Well let’s, introduce one new record into our sample data. This new record (Id=11) has time information in the sold_date field.
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) .' end as rslt
from (
        select
        id
        , sold_date
        , category
        , raw_data
        , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                                  and to_date('2013/02/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_1
        , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                                  and to_date('2013/03/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_2
        , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                                  and to_date('2013/04/01','YYYY/MM/DD')-1 then raw_data else 0 end as mth_3
        from sample
     ) inq
order by inq.sold_date

Open in new window

But, Deduct One Day can produce Gaps! On No!Oh No! A new type of error; Now there is a record that would be missed in calculations (a GAP in the ‘buckets’).

What caused this AND, how do I fix this? The cause is the “minus one”, when we deduct 1 from a date we are deducting “a whole day” so any time from 00:00:00 to 23:59:59:999999999999(recurring) of a day is now missing from our [high_boundary].

So what would we deduct instead? Well this is where it gets potentially tricky. If the data was only whole dates -1 is OK, we saw that before time was introduced into the data. If the data involves time, to the precision of 1 second, then we can deduct 1 second. If the time is more precise, say a millisecond, then we can deduct 1 millisecond. In other words you need to be aware of the data precision before choosing the right interval to deduct.

Before you start looking through your data type documentation let's review the next query covering the newly introduced record which does not use BETWEEN ... AND:
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) ..' end as rslt
from (
    select
    id
    , sold_date
    , category
    , raw_data
    , case when sold_date >= to_date('2013/01/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/02/01','YYYY/MM/DD') then raw_data else 0 end as mth_1
    , case when sold_date >= to_date('2013/02/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/03/01','YYYY/MM/DD') then raw_data else 0 end as mth_2
    , case when sold_date >= to_date('2013/03/01','YYYY/MM/DD')
            and sold_date <  to_date('2013/04/01','YYYY/MM/DD') then raw_data else 0 end as mth_3
    from sample
     ) inq
order by inq.sold_date

Open in new window

No Gaps by Using Less Than on High Boundary (no 'between' in query')Look Ma, no hands! No query modification needed and calculations are correct (as no gap exists).

So. Let’s recap.
BETWEEN ... AND is defined to INCLUDE both boundaries.
We saw that when using a string field (category) we could solve issues by making the ‘buckets’ mutually exclusive.
Then there are 2 techniques with date/time related information to make the buckets mutually exclusive:
Deduct an interval from the high_boundary, but we also saw that we must choose the correct interval precision.
OR: Don’t use BETWEEN ... AND, instead use
[expression] >= [low_boundary] AND [expression] < [high_boundary]
, plus we saw that in this approach we don’t have to choose an interval precision.

Ultimately it is your choice which technique to use: but hopefully you understand the need for choosing one of these techniques.

POSTSCRIPT

It is sometimes said that using BETWEEN ... AND makes for cleaner/neater more comprehensible code. So let’s go for just one more query to test this. As discussed above if we deduct the correct unit we can continue using BETWEEN ... AND. Now with this in mind let’s deduct one second instead of one day. Oh, there are options here too but I’m going with a traditional approach; 1/(24*60*60) equates to 1 second. So;
select
inq.id, inq.sold_date, mth_1, mth_2, mth_3, (mth_1 + mth_2 + mth_3) as total
, case when (mth_1 + mth_2 + mth_3) <> raw_data then '  WRONG :(' else '      :) _' end as rslt
from (
    select
    id
    , sold_date
    , category
    , raw_data
    , case when sold_date between to_date('2013/01/01','YYYY/MM/DD')
                              and to_date('2013/02/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_1
    , case when sold_date between to_date('2013/02/01','YYYY/MM/DD')
                              and to_date('2013/03/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_2
    , case when sold_date between to_date('2013/03/01','YYYY/MM/DD')
                              and to_date('2013/04/01','YYYY/MM/DD')-1/(24*60*60) then raw_data else 0 end as mth_3
    from sample
     ) inq
order by inq.sold_date

Open in new window

No Gaps by Deducting Correct Unit of Precision (eg. 1 second)Voila! The right results by deducting the right interval!

Now we can compare 2 functionally equivalent snippets of code:You decide: Which is 'cleaner'? (sample only! yes, there are many alternatives!)Do you see cleaner code (either way)? To me it’s either a tie, or just maybe it swings of favour of  the using LESS THAN?

CONCLUSION

For date/time related information regardless of data type, when using BETWEEN there is potential for ‘buckets’ to overlap and calculation errors to occur due to this. Take care to ensure that your ‘buckets’ are mutually exclusive either by adjusting one of the boundaries (usually the high) – OR – adopt the alternative which is to specifically use [expression] >= [low boundary] and [expression] < [high boundary] in your query code (not using BETWEEN).

Also: Don’t forget to avoid gaps. Particularly gaps you might introduce by deducting an inappropriately large time interval if using that technique.

REFERENCES

The intended focus of this article is "between" in Oracle/MSSQL, but it is applicable also to mySQL/Access

between @Oracle9i Lite SQL Reference
between (Oracle 9i) Cost Based Optimizer
between mySQL 5.0: expr between min and max
between (Transact-SQL) [2005]
between...and Operator Access 2003
between @w3schools.com (this indicates that some databases use a different definition for between)

between @Merriam-Webster @wiktionary.org
Beware-of-Between.ORA.sql
Beware-of-Between.MSSQL.sql
22
Comment
Author:PortletPaul
12 Comments
 
LVL 51

Expert Comment

by:Keith Alabaster
Nice article :)
0
 
LVL 111

Expert Comment

by:Ray Paseur
Great article!
0
 
LVL 60

Expert Comment

by:Kevin Cross
Welcome! Very nice first month here at EE. Glad you decided to share your knowledge with the community. You have my YES (+1) vote!
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 49

Author Comment

by:PortletPaul
Thanks folks - it's  a learning experience at me end as well. Lots of exposure to issue types I don't get in my day job. Cheers.
0
 
LVL 23

Expert Comment

by:Ioannis Paraskevopoulos
That is a nice article and changed the way i am writing queries. I used to like the BETWEEN statement because it was more straightforward, and i used to frown upon queries that used the gt - lt syntax, as they were supposed to be the same functionality but uglier.

Well done.
0
 
LVL 49

Author Comment

by:PortletPaul
Giannis,

Thank you! High praise indeed from someone I implicitly trust and respect. I am certain you will find avoiding "between ... and" will aid in more predictable and precise queries.

Cheers & regards,
Paul
0
 
LVL 37

Expert Comment

by:ValentinoV
Congrats with your article!  Just gave you a Yes vote which brings the total to 10 so you should be seeing your first Community Pick soon! :)

Ow, and welcome to EE!
0
 
LVL 49

Author Comment

by:PortletPaul
Thank you! & its a real pleasure to be here.

Regards, Paul
0
 
LVL 66

Expert Comment

by:Jim Horn
Very good read.  Voted yes.

>If the data involves time, to the precision of 1 second, then we can deduct 1 second.
I've chased down many a prod support ticket where developers failed to realize that 1-1-2013 translates to 1-1-2013 00:00:00, and wouldn't include 1-1-2013 12:34:56.
0
 
LVL 49

Author Comment

by:PortletPaul
:) Jim, thanks. Like you, I have wasted hours fixing such issues and have seen millions of $ "disappear" because some report writers actually believe between is dedicated for use with dates and don't appreciate that the duration of days can be excluded. I just keep plugging away in the hope the message gets through.
0
 
LVL 52

Expert Comment

by:Vitor Montalvão
Excellent article. Gave my vote of course.
Cheers
0
 
LVL 26

Expert Comment

by:Nick67
There's also the fun and hellacious problem of how dates are displayed in Access.
You can make a field up as DateTime and format it as "dd-mmm-yyyy"
It all looks good.  But it is stored as a Double with a Long integer value based on the number of days elapsed since 30-Dec-1899 and the decimal as a fraction of the day as the time.
Now, add some values to it using Date() and others using Now() and watch the fun ensue as you use BETWEEN...AND and get some records in at the  'boundary' and others out.
Those entered via Date() will be dd-mmm-yyyy 12:00:00 AM and will be in.
Those enter via Now() will be dd-mmm-yyyy and some time component and be out.

That trips up a lot of folks!
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Join & Write a Comment

This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Suggested Courses

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month