Friday 29 December 2017

SSIS Data Flow Error Handling with Description and Column Name

Introduction
Finding the Error Description is a simple one line code but finding Error Column name is not so much simple. Microsoft made it easy from SQL 2016, when modified API comes. But before 2016 it’s really a difficult job to find the Error Column Name by using Script Task and Script Component.
This article is trying to provide a simple step by step solution to find the Error Description and Error column name.

Scenario
We are trying to insert records from Flat file to our SQL Table. The flat file has two columns named “roll” and “name”. Sometimes name contains huge text and don’t support by our Table object (Size of the name column in table object is Varchar(50)) and hence generate Error. We are trying to display Error description and the column name that generate the Error.


Control/Data Flow Details



Control Flow – Get Linage ID Task
This Script Task is used to grade the Linage id at run time and save it in a text file.

What to do:
1.    Create a text file C:\temp\write.txt
2.    Create two package level variable named User::execsObj and User::lineageIds as Object type and pass them as Read Write variable of script task.
3.    In Script Task add assembly named Microsoft.SqlServer.DTSPipelineWrap

Namespace needed
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using System.Windows.Forms;
using System.Collections.Generic;
using System.IO;


Script Task Code in C#

public void Main()
{
 // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
// Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
Dts.Variables["User::execsObj"].Value =(Package)Dts.Variables["User::execsObj"].Parent).Executables;

Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

ReadExecutables(execs);

// Just proof of concept to see the results before you dedicate your time to the solution
// Delete this code in your actual implementation
using (StreamWriter writetext = new StreamWriter(@"C:\temp\write.txt", true))
        {
                foreach (var kvp in lineageIds)
                writetext.WriteLine(kvp.Key + " : " + kvp.Value);
        }
            Dts.TaskResult = (int)ScriptResults.Success;
}

        

private void ReadExecutables(Executables executables)
{
  foreach (Executable pkgExecutable in executables)
    {
     if (object.ReferenceEquals(pkgExecutable.GetType(),            
                 typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
                {
                    TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
                    if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
                    {
                        ProcessDataFlowTask(pkgExecTaskHost);
                    }
                }
     else if (object.ReferenceEquals(pkgExecutable.GetType(), 
                 typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
                {
                    // Recurse into FELCs
                    ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
                }
      }
 }

private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
   MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;

   foreach (IDTSComponentMetaData100 currentComponent in 
                  currentDataFlow.ComponentMetaDataCollection)
            {
              // Get the inputs in the component.
              foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
                  foreach (IDTSInputColumn100 currentInputColumn in 
                          currentInput.InputColumnCollection)
                        lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);

               // Get the outputs in the component.
               foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
                    foreach (IDTSOutputColumn100 currentoutputColumn in 
                                currentOutput.OutputColumnCollection)
                        lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
            }
}


Adding Script Component in Data Flow Task
Now add script component in data flow task.


What to do:
1.    Just pass the variable named User::lineageIds as Read variable
2.    Script Component input columns as ErrorCode and ErrorColumn
3.    Script Component output column as ErrorDescription and ErrorColumnDescription with DT_STR data type.

Name space
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Collections.Generic;

Script Component Code in C#
public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;

        int? colNum = Row.ErrorColumn;
        if (colNum.HasValue && (lineageIds != null))
        {
            if (lineageIds.ContainsKey(colNum.Value))
                Row.ErrorColumnDescription = lineageIds[colNum.Value];

            else
                Row.ErrorColumnDescription = "Row error";
        }
        Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
     }









Hope you find it informative.

Posted by: MR. JOYDEEP DAS

Saturday 23 December 2017

What is the best value for the Fill Factor

Introduction
Fill Factor is directly related to Index and Index is directly related to Performance. So for performance point of view Fill Factor play a very important role.
Fill Factor determines the percent of each leaf-level page to be filled with data. By default it is 100 percent. That means all most all the space of 8KB leaf-level page is filled with data.
Here in this article, I am not going to discuss about what fill factor is and how it reduces the index fragmentation. 
If you need the information related to fill factor and how it works, you can search on google. You can find a lot of articles are related to it.
Here in this article I am trying to discuss related to the best value that we can assign to fill factor.
As a SQL developer, we always face this type of scenario. The query in the mind is what the value we choose for fill factor. To resolve this query we just move to DBA and asking them. Sometimes they say 70% or 80%. 
But they don’t provide any explanation for it. Why they have chosen this digit as a value of fill factor and I personally don’t understand how they (DBA) decide them without seeing proper page split. It is really a misty for me. Anyway, they are the Boss and they have full authority to manage Database.

Here in this article I am proving my personal opinion to choosing correct fill factor for index. You can try this or go with the DBA’s definition.
There is no calculative method or mathematical formula to find out the correct fill factor. It’s totally depends on implementation and experience with indexing.
To understand it properly, here I am taking some scenario 

Scenario – 1
I have a table on OLAP environment and there is no DELETE or UPDATE operation is going on. Just new records are inserted.
I suggest that there is no need to implement Fill Factor or go with default Fill Factor 100 Percent.
Sometimes the improper fill factor decreases the performance.
Now let us assume that there is Table 1 which contains the data worth of 1000 pages. All these pages are 100% filled. If I run a query to retrieve all the data from the SQL Server, it should retrieve all the 1000 pages. 
If pages are only 50% failed to accommodate Table 1, it will need 2000 pages, which means SQL Server has to read twice the amount of the data from the disk, leading to higher usage of memory, CPU and IO bandwidth.

Scenario -2
I have a table on OLTP environment and INDERT/UPDATE/DELETE operation is huge.
We can find the Fill Factor of an existing table/Index by

SELECT OBJECT_NAME(OBJECT_ID) Name, type_desc, fill_factor
FROM sys.indexes

First don’t set any fill factor and just create the Index. After one or two week observe the average index fragmentation. 
If the nature table is static and there is no value changed, then there is no need to set any fill factor value.
If the table 
matter is less often Updated table, then set the fill factor value at 95%.
If 
the nature table is frequently Updated table, then set the fill factor value at 70 to 90%

Situation for Fill Factor
Fill Factor %
Static Table – The value of the table never changed
100
Tables Updated Less Often
95
Frequently Updated Table
70 to 90


But all is depends on the ratio of index fragmentation. We must observe the regular index fragmentation ratio and decide the fill factor accordingly. Not to decide any arbitrary digit as a value of fill factor.
We can find the index fragmentation ratio by using this SQL statement

SELECT b.name As [Table Name], c.name As [Index Name], avg_fragmentation_in_percent
FROM   sys.dm_db_index_physical_stats(DB_ID('Practice'), NULL, NULL, NULL , NULL)a
       INNER JOIN sys.tables b WITH (nolock) ON a.OBJECT_ID = b.OBJECT_ID
       INNER JOIN sys.indexes c WITH (nolock) ON a.OBJECT_ID = c.OBJECT_ID
                                                  AND a.index_id = c.index_id
WHERE  b.name = 'ORDER_PRICE'
       AND c.name = 'PK_OrderProduct';


Hope it will be informative.





Posted by: MR. JOYDEEP DAS

Sunday 19 November 2017

SSIS - Script Task is not working on Prod Server

Introduction
A common error that developers are facing when the SSIS package is deployed in the Production Server.
To understand it properly, let’s take a simple example.

Suppose in our package, we have a script task which is used to open an Excel file and just rename the sheet with a dynamic values came from DB table. It works fine in Development environment. But when it is deployed in production server, the script task is showing error.

Reason Behind it
Our development environment contains Microsoft Office installed. To open the Excel file by script task, he needs some reference of Excel (Excel must be register in GAC). As the Microsoft Office is installed in the Development environment the script task have no problem to work.

But in the Production environment, if you observe carefully there is no Microsoft Office component. So in this situation the script task of our package not get any reference of excel that’s the reason of failing script task.

How to Solve it
Simple solution is just installed Microsoft Office in the Production Server. But it is not so easy task as licensing is related to it. Sometimes the client doesn’t want to install it in production server.

To solve such kind of thing, we can suggest client to install just “2007 Microsoft Office system Primary Interop Assemblies” in the production server.


Web link to download:

The 2007 Microsoft Office Primary Interop Assemblies (PIA) redistributable is a Microsoft Windows Installer package that contains the Primary Interop Assemblies for 2007 Microsoft Office products. Specifically, the redistributable contains the PIAs for the following products:


·         Microsoft Office Access 2007
·         Microsoft Office Excel 2007
·         Microsoft Office InfoPath 2007
·         Microsoft Office Outlook 2007
·         Microsoft Office PowerPoint 2007
·         Microsoft Office Project 2007
·         Microsoft Office Publisher 2007
·         Microsoft Office SharePoint Designer 2007
·         Microsoft Office Visio 2007
·         Microsoft Office Word 2007

  
Hope you like it.


Posted by: MR. JOYDEEP DAS

Monday 6 November 2017

With(NOLOCK) Vs With(NOWAIT)

Introduction
We all know about dirty reads and it is not good for transaction table specially OLTP environment.
But sometimes customer wants like we are going to fire the query from frontend, if the table is locked by other transaction it just simply return a message. We don’t want any dirty read or not want to wait until the Transaction finish. If necessary, we again fire the query after sometime to get the result.
Before jumping into this type of solution we are going to show something related to dirty read.

Scenario
We have table named customer.

CREATE TABLE tbl_CustomerDetails
       (
              CustID        Int           NOT NULL IDENTITY Primary Key,
              CustName      Varchar(50)   NOT NULL,
              ContactNo     Varchar(12)   NOT NULL
       );
GO

INSERT tbl_CustomerDetails
       (CustName, ContactNo)
VALUES('Joydeep', '123456'),
      ('Deblina', '897654');

We are trying to update this customer table from session-1 and in the same time we are trying to retrieve record from customer table in session-2.

Session -1
BEGIN TRAN
   UPDATE tbl_CustomerDetails
      SET ContactNo = '99999'
   WHERE CustID = 1;

Please note that we are not providing any Commit or Rollback Transaction option in Session -1.
Now try to retrieve the records from session -2

Session -2
SELECT * FROM tbl_CustomerDetails;

Here we are not getting any output, until the Session -1 is committed or roll backed. Here the update statement of Session – 1 is just make lock on table object.
So it is not going to solve our solutions.
Now, to get the result.

Session -2
SELECT * FROM tbl_CustomerDetails With(NOLOCK);

Here we are using With(NOLOCK) keyword to get the dirty data.
CustID      CustName                                           ContactNo
----------- -------------------------------------------------- ------------
1           Joydeep                                            99999
2           Deblina                                            897654

(2 row(s) affected)

Now think about a situation.
In Session -1 we are using Rollback Transaction after getting the result in Session -2

Session – 1
ROLLBACK TRAN

So, what we get in the Session -2 is not correct data. That is called dirty read.
Now, it is not going to solve our problem as customer doesn’t want to get dirty records.
To solve this problem we are using With(NOWAIT) key word.

Session – 2
SELECT * FROM tbl_CustomerDetails With(NOWAIT);

CustID      CustName                                           ContactNo
----------- -------------------------------------------------- ------------
Msg 1222, Level 16, State 51, Line 1
Lock request time out period exceeded.

It is not waiting or not giving any dirty data. It just simply through an error message saying Lock request time out. It means Session-1 provides a lock on this table and for that it is unable to retrieve data.
So, we have to wait and fire the query again until the Session -1 finishes his transaction.
It is definitely solve our problem statement.
Hope you understand it.




Posted by: MR. JOYDEEP DAS


Sunday 9 July 2017

Extracting Data from Flat file with Different Columns Delimiter

Introduction

Hare in this article, we are going to discuss to retrieve data from flat file. Sounds look easy. No, it’s not. As our flat file columns delimiters are little different.

Sample of Flat file

We have a flat file named StudentRecords.txt



If we look at the flat file we can find that the columns have different type of Delimiter like

Roll  !  StudentName  !  Calss  -  Section

Roll, Student Name and Class are separated by explanation sign (!) and Section is separated by minus sign (-).

Solution Approach

Control Flow Task



Step – 1 [ Retrieve Data from Flat file Using Flat file Connection Manager ]









In the columns Delimiter drop-down we never find the explanatory sign (!). So we need to type it down.

The Columns Delimiter Drop-down Contains

Value
Description
{CR}{LF}
Columns are delimited by a carriage return-line feed combination.
{CR}
Columns are delimited by a carriage return.
{LF}
Columns are delimited by a line feed.
Semicolon {;}
Columns are delimited by a semicolon.
Colon {:}
Columns are delimited by a colon.
Comma {,}
Columns are delimited by a comma.
Tab {t}
Columns are delimited by a tab.
Vertical bar {|}
Columns are delimited by a vertical bar.

Note:  There is an option in this Drop-down that if any column separator is not found, we can directly type it down.


Step – 2 [ Save the records into another Flat file Destination ]



In the destination flat file connection manager, we use column delimiter as minus sign (-).
Now execute the package to generate the destination flat file.

Step – 3 [ Now read the Destination Flat File as Source ]








Step – 4 [ Execute package ]




Hope you like it.




Posted by: MR. JOYDEEP DAS