Sunday 26 May 2013

Fuzzy Lookup Transform


Introduction

Real-world data is "dirty" because of misspellings, truncations, missing or inserted tokens, null fields, unexpected abbreviations, and other irregularities. Fuzzy lookup enable us to match input records with clean, standardize records in a reference table. To understand it properly let's take an example. Suppose we have customer information like customer name and address. During the sales transaction we take the input for customer name and address which may not be matched exactly with records in the customer reference table because of typographical or others error in the input data. Fuzzy lookup returns the best matching records from the customer reference table even if no exact match exists.

So the fizzy lookup is a very useful transform for every SSIS developer in the real-world environment.  In this article we are going to learn about it.

How we use the Fuzzy Lookup Transform

Step-1 [ Create the Fuzzy Lookup Reference Table ]

-- Fuzzy Lookup Reference table
IF OBJECT_ID(N'tbl_FUZZYREFERENCES', N'U') IS NOT NULL
   BEGIN
      DROP TABLE tbl_FUZZYREFERENCES;
   END
GO
CREATE TABLE tbl_FUZZYREFERENCES
       (FIRSTNAME VARCHAR(50)  NOT NULL,
        LASTNAME    VARCHAR(50)  NOT NULL,
        DOB         DATETIME);
GO          
--Inserting Records
INSERT INTO tbl_FUZZYREFERENCES
       (FIRSTNAME, LASTNAME, DOB)
VALUES ('Joydeep', 'Das', '12-17-1974'),
       ('Shipra', 'Roy Chowdhury', '09-22-1974'),  
       ('Deeepasree', 'Das', '01-31-2003');
GO
SELECT * FROM tbl_FUZZYREFERENCES;                  

FIRSTNAME                        LASTNAME                                         DOB
Joydeep                                  Das                                                         1974-12-17 00:00:00.000
Shipra                                     Roy Chowdhury                                      1974-09-22 00:00:00.000
Deeepasree                             Das                                                         2003-01-31 00:00:00.000

Step-2 [ The Source Flat File ]

The flat file name is "FuzzySourceRecords.txt"



Step-3 [ Data Flow ]



Step-4 [ Fuzzy Lookup Transform Editor ]





Step-5 [ Derived Column Transform Editor ]



Step-6 [ Union All Transform Editor ]



Step-7 [ OLE DB Destination Create Table ]



Step-8 [ Running the Package ]



Step-8 [ Final Destination Table ]






Hope you like it.




Posted by: MR. JOYDEEP DAS

2 comments: