Saturday, April 18, 2015

Dynamic Data Load - SSIS

I was recently tasked with loading tab-delimited flat files into a staging table on SQL Server database. On the very 1st look, the requirement looks straight forward isn’t it? But there is a catch. The format of the file was not fixed and there can be varying number of columns in the incoming file. Also, the files can have more than 200 columns at a given time. The requirement is such that that package should not fail while processing these files and should load files successfully at the end of process. The solution should be more dynamic-like data load approach rather than handling data mapping at the design time.

I searched for this scenario over the web and came across a number of posting implementing the similar logic differently.

To start with my development I referred to the information from the following link http://www.citagus.com/citagus/blog/importing-from-flat-file-with-dynamic-columns/ but for some reason it didn’t work out in my environment, especially while looping through & loading varying formatted files & given the fact that I was looking at more than 200-column files. So I enhanced & fine tune the logic a little bit and came up with the following solution.Please find below quick summary of the steps that I followed to achieve this and it has worked well for me.



 Fig 1: Snapshot of the SSIS package that I designed to achieve this.

SCT_Generate Staging Table - Customize a Script task (I used VB.NET, but one can use C# as well) to generate dynamic SQL statement to create table based on the header information retrieved from the sample file. Load dynamically generated SQL statement into a SSIS string variable. I used variable User::strSQLQuery in my case. Please refer to Exhibit 1 for the source code and additional comments.

SQL_Drop Staging Table - Execute SQL Task to drop Staging table (dbo.StagingTable_BulkLoad) from the database. We need this step as we drop and re-create dbo.StagingTable_BulkLoad table with each run based on the dynamic header information available in the flat file.

SQL_Generate Staging Table: Execute SQL task to re-create dbo.StagingTable_BulkLoad table based on dynamic column information. Please refer to Fig 2 for more details.

Bulk_Load Staging Table - Bulk-load tab-delimited flat file into SQL table. Importantly, we cannot use DFT task to achieve this as the format of the incoming file is not known at the design time & it can vary from file to file so we need flexibility in loading the file. Bulk Upload feature provides us that flexibility. Please refer to Fig 3 for more details.


          Public Sub Main()

        Try
            'Populate fully qualified File name from the Package Variables
            Dim strFileName As String = Dts.Variables("User::ProcessingDirectory").Value.ToString() _
                                + "\" + Dts.Variables("User::FileName").Value.ToString()
            Dim objStrReader As StreamReader
            objStrReader = New StreamReader(strFileName)

            'Read Header Info
            Dim strHeaderInfo As String
            strHeaderInfo = objStrReader.ReadLine

            'Dispose StreamReader Object to Release Memory as we are intersted only in Header Information
            objStrReader.Dispose()

            ' Split tab-delimited Header Information
            Dim ColumnHeaders As String() = strHeaderInfo.Split(vbTab) ' To Hold All Column Names in an array of String
            Dim strSQL = "", ColumnName As String = "" ' Variables to Hold Generated SQL & Column Name information
            Dim ColumnHeaderList As New ArrayList

            For Each ColumnName In ColumnHeaders
                Dim Count As Integer = 0

   ' To Handle Duplicate Columns, if Column Name already exists then use _1, _2 suffix to differentiate them
                If ColumnHeaderList.Contains(ColumnName.ToUpper()) Then
                    ColumnName = ColumnName + "_" + Count.ToString()
                    Count = Count + 1
                End If

                ColumnHeaderList.Add(ColumnName.ToUpper)

                ' Generate Table with Column lengh varchar(max) to cover any data overflow errors
                If strSQL <> "" Then
                    strSQL = strSQL & ", [" & ColumnName.ToString() & "] VARCHAR(max) "
                Else
                    strSQL = "[" & ColumnName.ToString() & "] VARCHAR(max) "
                End If

            Next

            Dim strSQLQuery As String

            strSQLQuery = "CREATE TABLE dbo.StagingTable_BulkLoad (" + strSQL.ToString() + ")"
            Dts.Variables("User::strSQLQuery").Value = strSQLQuery
            Dts.TaskResult = ScriptResults.Success

        Catch ex As Exception

            Dts.TaskResult = ScriptResults.Failure
        End Try

    End Sub

Exhibit 1 – VB.NET code block for “SCT_Generate Staging Table” task to generate dynamic SQL statement based on header information retrieved from the sample file.



Fig – 2: Execute SQL task (SQL_Generate Staging Table) to generate dbo.StagingTable_BulkLoad table based on dynamic header information



Fig 3: Bulk Insert Task (Bulk_Load Staging Data) to bulk load data from Flat File to SQL table.

Apart from these settings you will also require a Flat File connection manager & OLEDB Data Source to configure & to connect to the flat file source and the back-end SQL database at the run-time.

Please give this a try and let me know how it goes. I have been using this approach in my environment for a while now and it has worked well for me for now.

Thanks,
Rishi Saxena
View Rishi Saxena's profile on LinkedIn

No comments:

Post a Comment