Wednesday, 6 June 2012

INNER or LEFT JOIN for Performance


In this article I am trying to discuss a very common scenario that between INNER JOIN and LET JOIN which one is increase the performance and also discuss how to boost the join performance.

First look at the definition of both INNER and LEFT JOIN.

INNER JOIN

The INNER JOIN keyword return rows when there is at least one match in both tables.

LEFT JOIN

The LEFT JOIN keyword returns all rows from the left table (Table-1), even if there are no matches in the right table (Table-2).


General rules

INNER JOIN are usually faster than left joins, but if we need a left join for unmatched results then an inner join will not give you the results that we need.

If we are filtering on the "right" table in a LEFT JOIN, then indeed we should use an INNER JOIN instead, and we may get better performance.

What exactly we have to do

Only thing is that we must look at the execution plan very carefully as the Query use the HASH or MERGE Join operator. As the MERGE JOIN which is more efficient than HASH JOIN for fairly small, sorted tables.

The extra work the left join performs is when a row in the left table has no matching rows in the right table; the result set row contains NULL values for all the selected columns coming from the right table. If this is what we want, then use this type of join and it is the real performance killer.

So it matters depends on situation to situation.


Here are an example of INNER JOIN and a LEFT JOIN which gives us the same result sets.

CREATE TABLE #tem_Tab1
             (emcd      INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
              empname   VARCHAR(50) NOT NULL)
             
INSERT INTO #tem_Tab1
            (empname)
VALUES ('Joydeep'), ('Sangram'), ('Sudip'), ('Tuhin')

CREATE TABLE #tem_Tab2
             (emcd      INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
              empsal    DECIMAL(20,2) NOT NULL)
             
INSERT INTO #tem_Tab2
            (empsal)
VALUES (10000.00), (23000.00), (12000.00)       

SELECT * FROM #tem_Tab1
SELECT * FROM #tem_Tab2

SELECT #tem_Tab1.emcd, #tem_Tab1.empname, #tem_Tab2.empsal
FROM   #tem_Tab1 INNER JOIN #tem_Tab2 ON #tem_Tab1.emcd=#tem_Tab2.emcd


SELECT #tem_Tab1.emcd, #tem_Tab1.empname, #tem_Tab2.empsal
FROM   #tem_Tab1 LEFT JOIN #tem_Tab2 ON 1=1
WHERE  #tem_Tab1.emcd=#tem_Tab2.emcd


Look at the execution plan of both the query and we find the same execution plan for both the query.

To Boost the Join performance


1.    When joining two or more table performance increased, if join columns have index.


2.    Foreign keys are not automatically indexed. So if you ever plan to join a table to the table with the foreign key, using the foreign key as the linking column, then we should consider adding an index to the foreign key column. An index on a foreign key column can substantially boost the performance of many joins.


3.    For best join performance, the indexes on the columns being joined should ideally be numeric data types, not CHAR or VARCHAR, or other non-numeric data types.


4.    For maximum performance when joining two or more tables, the indexes on the columns to be joined should have the same data type, and ideally, the same width.


5.    If our join is slow, and currently includes hints, remove the hints to see if the optimizer can do a better job on the join optimization.


6.    One of the best ways to boost JOIN performance is to ensure that the Joined tables include an appropriate WHERE clause to minimize the number of rows that need to be joined.


7.    In the SELECT statement that creates your JOIN, don't use an * to return all of the columns in both tables.

8.    If you have the choice of using a JOIN or a subquery to perform the same task, generally the JOIN (often an OUTER JOIN) is faster.

Hope you like it.





Posted by: MR. JOYDEEP DAS

4 comments:

  1. Very helpfull article thanks... I think if our second table has less amount of data then subquery is better option.

    ReplyDelete