I was recently tasked with loading tab-delimited flat
files into a staging table on SQL Server database. On the very 1st
look, the requirement looks straight forward isn’t it? But there is a catch.
The format of the file was not fixed and there can be varying number of columns
in the incoming file. Also, the files can have more than 200 columns at a given
time. The requirement is such that that package should not fail while processing
these files and should load files successfully at the end of process. The
solution should be more dynamic-like data load approach rather than handling data
mapping at the design time.
I searched for this scenario over the web and came across a number of posting implementing the similar logic differently.
To start with my development I referred to the information from the following link http://www.citagus.com/citagus/blog/importing-from-flat-file-with-dynamic-columns/ but for some reason it didn’t work out in my environment, especially while looping through & loading varying formatted files & given the fact that I was looking at more than 200-column files. So I enhanced & fine tune the logic a little bit and came up with the following solution.Please find below quick summary of the steps that I followed to achieve this and it has worked well for me.
I searched for this scenario over the web and came across a number of posting implementing the similar logic differently.
To start with my development I referred to the information from the following link http://www.citagus.com/citagus/blog/importing-from-flat-file-with-dynamic-columns/ but for some reason it didn’t work out in my environment, especially while looping through & loading varying formatted files & given the fact that I was looking at more than 200-column files. So I enhanced & fine tune the logic a little bit and came up with the following solution.Please find below quick summary of the steps that I followed to achieve this and it has worked well for me.
Fig 1: Snapshot of the SSIS package that I designed to achieve this.
SCT_Generate
Staging Table - Customize a Script task (I used VB.NET,
but one can use C# as well) to generate dynamic SQL statement to create table based
on the header information retrieved from the sample file. Load dynamically
generated SQL statement into a SSIS string variable. I used variable User::strSQLQuery
in my case. Please refer to
Exhibit 1 for the source code and additional comments.
SQL_Drop Staging Table - Execute SQL Task to drop Staging table (dbo.StagingTable_BulkLoad) from the database. We need this step as we drop and re-create dbo.StagingTable_BulkLoad table with each run based on the dynamic header information available in the flat file.
SQL_Generate Staging Table: Execute SQL task to re-create dbo.StagingTable_BulkLoad table based on dynamic column information. Please refer to Fig 2 for more details.
Bulk_Load Staging Table - Bulk-load tab-delimited flat file into SQL table. Importantly, we cannot use DFT task to achieve this as the format of the incoming file is not known at the design time & it can vary from file to file so we need flexibility in loading the file. Bulk Upload feature provides us that flexibility. Please refer to Fig 3 for more details.
SQL_Drop Staging Table - Execute SQL Task to drop Staging table (dbo.StagingTable_BulkLoad) from the database. We need this step as we drop and re-create dbo.StagingTable_BulkLoad table with each run based on the dynamic header information available in the flat file.
SQL_Generate Staging Table: Execute SQL task to re-create dbo.StagingTable_BulkLoad table based on dynamic column information. Please refer to Fig 2 for more details.
Bulk_Load Staging Table - Bulk-load tab-delimited flat file into SQL table. Importantly, we cannot use DFT task to achieve this as the format of the incoming file is not known at the design time & it can vary from file to file so we need flexibility in loading the file. Bulk Upload feature provides us that flexibility. Please refer to Fig 3 for more details.
Public Sub Main()
Try
'Populate fully qualified File name from the Package
Variables
Dim strFileName As String = Dts.Variables("User::ProcessingDirectory").Value.ToString()
_
+ "\" + Dts.Variables("User::FileName").Value.ToString()
Dim objStrReader As
StreamReader
objStrReader = New
StreamReader(strFileName)
'Read Header Info
Dim strHeaderInfo As String
strHeaderInfo = objStrReader.ReadLine
'Dispose StreamReader Object to Release Memory as we are
intersted only in Header Information
objStrReader.Dispose()
' Split tab-delimited Header Information
Dim ColumnHeaders As String() = strHeaderInfo.Split(vbTab) ' To Hold All Column Names in an array of String
Dim strSQL = "",
ColumnName As String
= "" '
Variables to Hold Generated SQL & Column Name information
Dim ColumnHeaderList As
New ArrayList
For Each ColumnName In ColumnHeaders
Dim Count As Integer = 0
' To Handle Duplicate Columns, if Column Name already
exists then use _1, _2 suffix to differentiate them
If ColumnHeaderList.Contains(ColumnName.ToUpper()) Then
ColumnName = ColumnName + "_"
+ Count.ToString()
Count = Count + 1
End If
ColumnHeaderList.Add(ColumnName.ToUpper)
' Generate Table with Column lengh varchar(max) to cover
any data overflow errors
If strSQL <> ""
Then
strSQL = strSQL & ", ["
& ColumnName.ToString() & "]
VARCHAR(max) "
Else
strSQL = "[" &
ColumnName.ToString() & "] VARCHAR(max)
"
End If
Next
Dim strSQLQuery As String
strSQLQuery
= "CREATE TABLE dbo.StagingTable_BulkLoad
(" + strSQL.ToString() + ")"
Dts.Variables("User::strSQLQuery").Value
= strSQLQuery
Dts.TaskResult = ScriptResults.Success
Catch ex As Exception
Dts.TaskResult = ScriptResults.Failure
End Try
End Sub
Exhibit 1 – VB.NET code block for “SCT_Generate Staging
Table” task to generate dynamic SQL statement based on header information
retrieved from the sample file.
Fig
– 2: Execute SQL task (SQL_Generate Staging Table) to generate dbo.StagingTable_BulkLoad table based on dynamic header
information
Fig
3: Bulk Insert Task (Bulk_Load Staging Data) to bulk load data from Flat File
to SQL table.
Apart from these
settings you will also require a Flat File connection manager & OLEDB Data Source
to configure & to connect to the flat file source and the back-end SQL database at the run-time.
Please give this a try and let me know how it goes. I
have been using this approach in my environment for a while now and it has
worked well for me for now.
Thanks,
No comments:
Post a Comment