Tuesday, 9 October 2012

BETWEEN Clause Vs [ >= AND <= ]


Introduction

In this article I am trying to discuss related to BETWEEN clause and >= AND <= comparisons operators and which one is best.
Most of the junior developer has some bad concepts related to BETWEEN clause and they told, not to use it. In this article I am demon staring the BETWEEN clause and how good it is.

What is the Difference

Let's take an example to understand it properly.

Step-1 [ Create the Base Table ]

CREATE TABLE my_TestTab
       (ID    INT      NOT NULL IDENTITY(1,1) PRIMARY KEY,
        VAL   DATETIME NOT NULL);

Step-2 [ Inserting Records ]

INSERT INTO my_TestTab
      (VAL)
VALUES('05-01-2012'),
      ('05-07-2012'),
      ('05-11-2012'),
      ('05-15-2012'),
      ('05-22-2012'),
      ('05-23-2012'),
      ('05-25-2012'),
      ('05-27-2012'),
      ('05-28-2012');

Step-3 [ Now use Between Clause ]

SELECT *
FROM   my_TestTab
WHERE  VAL BETWEEN  '05-01-2012' AND  '05-28-2012';

Output:

ID            VAL
1              2012-05-01 00:00:00.000
2              2012-05-07 00:00:00.000
3              2012-05-11 00:00:00.000
4              2012-05-15 00:00:00.000
5              2012-05-22 00:00:00.000
6              2012-05-23 00:00:00.000
7              2012-05-25 00:00:00.000
8              2012-05-27 00:00:00.000
9              2012-05-28 00:00:00.000

Step-4 [ Now using >= AND <= ]

SELECT *
FROM   my_TestTab
WHERE  VAL >='05-01-2012' AND VAL<='05-28-2012'; 

Output:

ID            VAL
1              2012-05-01 00:00:00.000
2              2012-05-07 00:00:00.000
3              2012-05-11 00:00:00.000
4              2012-05-15 00:00:00.000
5              2012-05-22 00:00:00.000
6              2012-05-23 00:00:00.000
7              2012-05-25 00:00:00.000
8              2012-05-27 00:00:00.000
9              2012-05-28 00:00:00.000

To observe the difference between Step-3 and Step-4 just execute the SQL of Step-3 Again and see the actual execution plan.

SELECT *
FROM   my_TestTab
WHERE  VAL BETWEEN  '05-01-2012' AND  '05-28-2012';



If we see the execution plan we find that

SELECT * FROM  my_TestTab WHERE  [VAL] >= @1 AND [VAL]<=@2

So there is no difference between Step-3 and Step-4. Actually internally the BETWEEN clause is converted to >= and <= logical operators.

Summary

As per me using BETWEEN clause is much easier then the >= and <= operators and it looks great in complex query.
It actually depends on developer and there writing style of T-SQL.




Posted by: MR. JOYDEEP DAS

6 comments:

  1. It also depends on having a clustered index on the column you are comparing. Notice that your query plan shows a clustered index scan (effectively a table scan).

    If you had a clustered index on the column being compared, you would have an index seek, not a scan. This is one of many reasons to NOT have a clustered index on your primary key, but to instead use clustered indexes where the engine would most benefit from it.

    Your example does benefit from having the dates inserted in pre-sorted order. In an OLTP system, it is highly unlikely that your dates will be entered in pre-sorted order.

    ReplyDelete
    Replies
    1. First of all thanks for your interest.
      I am not getting what you mean by INDEXING. Here I am trying to figure out the difference between [ >= AND <= ] and BETWEEN clause. Which one is good? As per me both are the same and if we think about the performance factors both are same (Here in this example I am not considering the pre sorted data).
      If you think that there is some difference between them in term of performance, please explain it.

      Delete
  2. A blast from the past :). This one made me recollect one of my older posts: http://beyondrelational.com/modules/2/blogs/77/posts/11353/date-comparison-dissecting-between-use-between-or-comparison-greater-thanless-than-operators-with-eq.aspx

    ReplyDelete
    Replies
    1. Thanks "Nakul"
      Can u tell me which one is better performance wise?

      Delete
  3. This question seemed to grab my attention so I set up a real world test it and wanted to share my results.

    The following test scenario is using a query to select created dates for users in one my databases.

    The table is about 0.969 MB and has a good load of data for this example with 4750 rows spanning 5 years

    The tables index set up is 1 clustered on the PK and 5 non-clustered indexes exist but none on the create_timestamp column and exactly as Marc pointed out in real world it is unsorted data.

    Query tuning reflected that SQL shows this query missing an index on the date column. For performance the addition of the non-clustered index on this column is recommended. I did not add it but I wanted to point out that it is a recommended change and here is the details of the impact.

    Missing Index Details -
    The Query Processor estimates that implementing the following index could improve the query cost by 94.1297%.

    CREATE NONCLUSTERED INDEX []
    ON [dbo].[user_mstr] ([create_timestamp])

    select create_timestamp from user_mstr
    where create_timestamp between '2007-02-19 11:01:32.727' and '2011-01-28 11:55:42.313'

    This returned 3683 rows.

    Profiler showed the following.
    CPU: 15
    Reads: 132
    Write: 0
    Duration: 4 <=====

    Same query.

    This returned 3683 rows.

    select create_timestamp from user_mstr
    where create_timestamp >= '2007-02-19 11:01:32.727'
    and create_timestamp <= '2011-01-28 11:55:42.313'

    Profiler showed the following.
    CPU: 0
    Reads: 132
    Write: 0
    Duration: 6 <=====

    It’s important to note that behavior was only significantly different on the first execution (prior to caching) afterward, the execution of the two were virtually the same.

    Here are 4 subsequent executions.

    Using Between
    CPU: 0
    Reads: 126
    Write: 0
    Duration: 2 <=====

    Using >=/<=
    CPU: 0
    Reads: 126
    Write: 0
    Duration: 3 <=====

    Using Between
    CPU: 0
    Reads: 126
    Write: 0
    Duration: 3 <=====

    Using >=/<=
    CPU: 0
    Reads: 126
    Write: 0
    Duration: 2 <=====

    In the end I agree with Nakul as using between as a best practice for readability and asthetics but with the above results it seems that Between is also a better choice for performance.

    ReplyDelete
    Replies
    1. Thanks "Jeremiah Davis"
      Thank you for your guideline.

      Delete