SSIS: Make your output files dynamic

I like making my SSIS packages as dynamic as possible. Once that package has been deployed into production I want to avoid opening it up in BIDS if possible. I’ve blogged previously about using Stored Procedures in Execute SQL Tasks but this only gives us flexibility in terms of the where clause. We have no flexibility in terms of the columns unless we open the package up in BIDS.

I wanted to build a little more flexibility into my packages. Wouldn’t it be great if we could simply modify a stored procedure to include additional columns, and these changes would be reflected in the output files, with no further work required? Here’s an outline of my first steps towards achieving this.

  • First create a stored procedure, that returns a resultset, in a test database. Anyone will do. Here’s one I created in the AdventureWorks test database.
USE [AdventureWorks]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_test] AS
BEGIN
 
	SELECT *
	FROM Production.Product;
 
END
  • Open up BIDS and create a new integration services project.
  • Add an object variable like below called resultset.

ssis resultset variable

  • Add an Execute SQL Task to the designer and configure it to execute the procedure you created earlier. Configure it similarly to below, ensuring Result Set is set to Full result set.

ssis exececute sql task

  • Go to the Result Set tab and add a mapping to the resultset object variable. Configure this exactly as shown below.

ssis exec sql task resultset thumb SSIS: Make your output files dynamic

  • Add a Script Task to the designer and connect the Execute SQL Task to it. Edit the Script Task and enter the resultset variable in the ReadOnlyVariable box.

ssis script task

  • Click “Edit Script” and paste the below code into VSTA. The only part of this code you should need to change is the path where the text file is written to.
' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic 2008.
' The ScriptMain is the entry point class of the script.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Data.OleDb
Imports System.IO

 _
 _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

    Public Sub Main()
        '
        ' Add your code here
        '
        Dim filename As String = Format(Date.Now, "yyyy-MM-dd_hh_mm_ss") & ".csv"
        Dim fileContents As String = ""

        Dim oledb As OleDbDataAdapter = New OleDbDataAdapter()
        Dim table As DataTable = New DataTable()
        Dim rs As System.Object = Dts.Variables("resultset").Value

        oledb.Fill(table, rs)

        ' Get the column names
        For Each col In table.Columns
            fileContents &= col.ColumnName & "|"
        Next

        ' remove last pipe
        fileContents = fileContents.Substring(0, fileContents.Length - 1)
        fileContents &= Environment.NewLine

        ' For each row in the dataset
        Dim i As Integer
        For Each row As DataRow In table.Rows
            ' For each column in the row
            For i = 1 To table.Columns.Count
                fileContents &= row(i - 1).ToString() & "|"
            Next

            ' Remove last pipe and add a newline to the end of each row
            fileContents = fileContents.Substring(0, fileContents.Length - 1)
            fileContents &= Environment.NewLine

        Next

        ' write data to the text file
        ' Change the path to something appropriate
        SaveTextToFile(fileContents, "C:\Users\Rhys\Desktop\" & filename)

        Dts.TaskResult = ScriptResults.Success
    End Sub

    ' Function from http://www.freevbcode.com/ShowCode.Asp?ID=4492
    Public Function SaveTextToFile(ByVal strData As String, _
     ByVal FullPath As String, _
       Optional ByVal ErrInfo As String = "") As Boolean

        Dim bAns As Boolean = False
        Dim objReader As StreamWriter
        Try

            objReader = New StreamWriter(FullPath)
            objReader.Write(strData)
            objReader.Close()
            bAns = True
        Catch Ex As Exception
            ErrInfo = Ex.Message

        End Try
        Return bAns
    End Function

End Class
  • Save the script and return to the designer. The final package should look something like below.

ssis dynamic output file package

  • Execute the package and then check the location where your output file should be written to. Here’s what the file should look like if you’re using my usp_test stored procedure in the AdventureWorks database.

AdventureWorks Output File Production.Products

  • Now comes the fun part! Modify the usp_test procedure to select from a completely different table. For example;
USE [AdventureWorks]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_test] AS
BEGIN
 
	SELECT *
	FROM HumanResources.Employee;
 
END
  • Ordinarily this change would break an SSIS package. Execute the package again and check the output file.

AdventureWorks Output File HumanResources.Employee

  • Let’s try another version of usp_test.
USE [AdventureWorks]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_test] AS
BEGIN
 
	SELECT 'I can change my output files simply by changing this stored procedure!', GETDATE();
 
END
  • Run the package and check the new output file.

AdventureWorks Output File usp_test

OK, so we now have an SSIS package, producing output files, that copes with complete modifications to stored procedures with no other changes needed. This isn’t yet production ready as there are a few issues I need to resolve.

  • The Script Task seems to buckle for XML data types giving an unsupported conversion error. (Use the Person.Contact table in the AdventureWorks database to view this issue).
  • The package does not seem to cope with a large number of rows. Execution appears to freeze for datasets containing somewhat over 1,000 rows. (Use the Person.Address table in the AdventureWorks database to view this issue). I’m wondering if this is due to size limitation with the System.Object type.

I’ll update this post one I get around to resolving these issues.

Powershell Random sort

Here’s a Powershell snippet that randomly sorts the lines in a file. The snippet below reads a text file, called file.txt, located in your user profile directory. The data in the file will be written back with the lines in a different order.

?View Code POWERSHELL
$data = New-Object System.Object;
$data = Get-Content "$Env:USERPROFILE\file.txt";
# Random sort the lines in the file
$data = $data | sort {[System.Guid]::NewGuid()};
Set-Content "$Env:USERPROFILE\file.txt" $data;

If your file started off like this…

file ordered

It will end up looking something like…

file random sort

This is functionally equivalent to…

SELECT *
FROM dbo.Customers
ORDER BY NEWID();

Postgres Linked Server How To

Just a quick post showing how to add a Postgres database server as a Linked Server in Microsoft SQL Server.

  • Install the psqlODBC driver for Windows.
  • Control Panel > Administrative Tools > Data Sources (ODBC).
  • Click the "System DSN" tab and click the Add button.

Create New Data Source Dialog

  • Choose "PostreSQL ANSI" and click Finish.
  • Configure the data source by entering the server, username, password and database details.

Postgres ODBC Setup

  • Click "Test" to confirm the data source functions. If it fails ensure that Postgres is running and check your configuration details.
  • Click "Save" and then "OK" once successful.
  • Fire up SSMS and run the following T-SQL to create the Linked Server to Postgres.
USE [master]
GO
EXEC master.dbo.sp_addlinkedserver @server = N'POSTGRES', @srvproduct=N'Postgres', @provider=N'MSDASQL', @datasrc=N'PostgreSQL30'
 
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'collation compatible', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'data access', @optvalue=N'true'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'rpc', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'rpc out', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'connect timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'collation name', @optvalue=null
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'query timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'POSTGRES', @optname=N'use remote collation', @optvalue=N'true'
GO
USE [master]
GO
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'POSTGRES', @locallogin = NULL , @useself = N'False'
GO

Alternatively copy the below configuration in SSMS. The most critical thing here is to get the setting for "Data source" correct. This should be the name of the System DSN you added earlier.

Postgres Linked Server Configuration

  • Finally execute the query below. This uses OPENQUERY and will list all the tables in your Postgres system database.
SELECT *
FROM OPENQUERY(POSTGRES, 'SELECT * FROM INFORMATION_SCHEMA.TABLES');

If everything is running correctly you should see something like below.

Postgres openquery result

PsFetch

I really used to like apt-get when I used to run Linux as my home os a few years ago. Once you got the hang of it installing and managing software was easy. I’m obviously pleased to see PsFetch in development. This tool is the Windows Powershell take on apt-get which will allow you to get your hand on cmdlets, scripts, and by the looks of it, tools from codeplex. They’re currently building up a repository with lots of cool scripts.

Read more about PsFetch and sign-up for public beta testing.

PowershellPack STA Error

I’m really into Powershell so I like trying out all the available tools. Somehow I’ve missed the release of the PowershellShellPack which has ten modules offering all kinds of additional functionality including file handling, operating system information, GUI and code generation. I followed the Channel 9 video and ran into a problem on the first example given.

First I ran the command Import-Module PowerShellPack.

Powershell Import Module PowerShellPack

Then I tried to create, and display, a label as in the video example; New-Label "Hello World" -FontSize 40 -Show

Powershell STA modeerror

This resulted in the following error.


New-Object : Exception calling ".ctor" with "0" argument(s): "The calling thread must be STA, because many UI components require this."
At C:\Users\Rhys\Documents\WindowsPowerShell\Modules\WPK\GeneratedControls\PresentationFramework.ps1:30292 char:29
+         $Object = New-Object <<<<  System.Windows.Controls.Label
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

OK, I seem to need to be running in STA mode. I Googled "Powershell STA mode" and discovered a Stackoverflow thread. After reading this I thought I’d try starting Powershell with the switch -STA.

powershell start in sta mode

OK, that seemed to work. Now lets try to create that label again. First load the PowerShellPack modules again since we’ve started a new session.

Import-Module PowerShellPack

Now try to create the label again.

New-Label "Hello World" -FontSize 40 -Show

Powershell Hello World Label

OK, that’s that hurdle over. Now I can get on with exploring what this tool can do for me.

Extract Stored Procedure Comments with TSQL

I’ve blogged before about documenting databases. I’m very much a fan of extracting documentation from systems themselves so it’s as up-to-date as it can be. That’s probably why I’m such a big fan of Powershell a tool that excels at this task. This week I was thinking about how to get at the comments often placed at the top of stored procedure definitions. I’m referring to the little block of comments that Microsoft are encouraging us to fill out when we create a new store procedure.

-- =============================================
-- Author:	<author  ,,NAME>
-- Create date: <create date,,>
-- Description:	<description  ,,>
-- =============================================

Would it not be useful to get our hands on these comments? Here’s quick TSQL script you can use to attempt to extract those comments.

-- Drop temporary tables if they exist
IF OBJECT_ID('tempdb..#ProcDoc') IS NOT NULL
BEGIN
	DROP TABLE #ProcDoc;
END
 IF OBJECT_ID('tempdb..#temp') IS NOT NULL
 BEGIN
	DROP TABLE #temp;
END
 
-- Create a table to hold procedure documentation
CREATE TABLE #ProcDoc
(
	Id INTEGER NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	[Schema] VARCHAR(100),
	[Proc] VARCHAR(100),
	[Author] VARCHAR(100),
	[CreatedDate] VARCHAR(20),
	[Description] TEXT
);
 
DECLARE @proc VARCHAR(100),
		@schema VARCHAR(100),
		@proc_id INTEGER,
		@schema_proc VARCHAR(200);
 
-- Cursor to work through our procs
DECLARE procCursor CURSOR LOCAL FAST_FORWARD FOR SELECT p.[name] AS [proc],
							s.[name] AS [schema]
						 FROM sys.procedures p
						 INNER JOIN sys.schemas s
							ON s.schema_id = p.schema_id;
 
OPEN procCursor;
FETCH NEXT FROM procCursor INTO @proc,
				@schema;
 
WHILE (@@FETCH_STATUS = 0)
BEGIN
 
	INSERT INTO #ProcDoc
	(
		[Schema],
		[Proc]
	)
	VALUES
	(
		@Schema,
		@Proc
	);
 
	SET @proc_id = SCOPE_IDENTITY();
 
	-- Create a temp table to hold comments
	CREATE TABLE #temp
	(
		[Text] VARCHAR(4000) NULL
	);
 
	 -- Build schema + proc string
	SET @schema_proc = @schema + '.' + @proc;
 
	-- sp_helptext to get proc definition
	-- and insert into a temp table
	INSERT INTO #temp
	EXEC sys.sp_helptext @schema_proc;
 
	-- Just an id we'll use later to identify rows
	ALTER TABLE #temp ADD Id INTEGER NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED;
 
	-- Get proc author
	UPDATE #ProcDoc
	SET [Author] = 
	(
		SELECT (SELECT SUBSTRING([Text], PATINDEX('--Author: ', [text]) + 12, LEN([Text]) - (PATINDEX('--Author: ', [text]) + 12))) AS Author
		FROM #temp
		WHERE [text] LIKE '-- Author:%'
	)
	WHERE Id = @Proc_Id;
 
	-- Get proc created date
	UPDATE #ProcDoc
	SET CreatedDate =
	(
		SELECT (SELECT SUBSTRING([Text], PATINDEX('--Author: ', [text]) + 17, LEN([Text]) - (PATINDEX('--Author: ', [text]) + 17)))
		FROM #temp
		WHERE [text] LIKE '-- Create date:%'
	)
	WHERE Id = @Proc_Id; 
 
	-- Get proc description
	-- Bit messy here but works for my situation
	-- probably need modification dpending on your commenting habits
	UPDATE #ProcDoc
	SET [Description] = REPLACE(CONVERT(VARCHAR(4000),  
	(
		SELECT *
		FROM
		( 
			SELECT REPLACE(REPLACE([text], '-- Description:', ''), '-- ', '') AS [text()]
			FROM #temp
			WHERE Id >= (SELECT Id FROM #temp WHERE [text] LIKE '-- Description:%')
			AND Id < (SELECT (Id - 1) FROM #temp WHERE [text] LIKE '%CREATE PROCEDURE%')
		) AS t FOR XML PATH('')
	)), '&#x0D;', '') -- Replace CR entities
	WHERE ID = @Proc_id;
 
	-- Drop the temp table
	DROP TABLE #temp;
 
	-- Get the next row
	FETCH NEXT FROM procCursor INTO @proc,
					@schema;
 
END
 
-- Clean up
CLOSE procCursor;
DEALLOCATE procCursor;
 
-- View procedure documentation
SELECT *
FROM #ProcDoc
WHERE [Description] IS NOT NULL;

All tables are temporary so there’s nothing to clean up. You’ll probably have to tweak this script as your commenting habits will likely vary. If all goes well the script should produce something looking like below.

extracting stored procedure comments

British Government launches free data site

Just a quick post about the British Governments launch of data.gov.uk offering free access to a large number of datasets. It’s good to see this data, that we as taxpayers have funded, available for free use. Hats off to The Guardian’s long running free our data campaign.

The site is being officially launched by Sir Tim Berners Lee tomorrow, but you can access the site now. The US has their own version data.gov but the British site boasts nearly three times as many datasets. Take a peek at the available datasets. It always fantastic looking through the great number of apps that people create around freely available data. Why not take a look and start your own project?

Those interested in London specific data can check out the London DataStore. People in sunny San Francisco have datasf and Chicago also have a dataset section on their website. Read more about this.

Execute SQ… Procedure Task

I’ve been having a bit of a debate with some colleagues today about the Execute SQL Procedure Task in SSIS. Is it ok to enter raw SQL queries into this task or should everything be enclosed within a Stored Procedure? My view…

execute sql task bad red x
execute procedure task good thumb Execute SQ... Procedure Task green tick

For me, any day of the week, it should always be a proc. Why on earth would you want to enter raw queries into this task? I would argue against this even for basic message logging. Enclosing your TSQL in stored procedures gives you much more flexibility. It’s so much easier to make minor modifications to a proc. Would you rather have to go through the rigmarole of opening and deploying SSIS packages. Sure, I know you’ll have to open the package if you introduce, or remove, new columns. But you’re pretty much free to modify your WHERE without issue. Lets not forget this also allows your system to be modified by those who don’t know SSIS. Perhaps you’ll be able to enjoy your holiday a little better next year.

I’m even tempted to open a Microsoft Connect item to fix this ‘bug’. Perhaps one day I will be lucky enough to see this task available in my toolbox.

execute procedure task thumb Execute SQ... Procedure Task

View or function ‘dbo.Viewname’ has more column names specified than columns defined

If you ever encounter this SQL Server error when selecting from a view then somebody has probably dropped columns from the base table. Here’s a quick run through of the problem.

CREATE TABLE dbo.Contact
(
	Id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	FirstName VARCHAR(50) NOT NULL,
	LastName VARCHAR(50) NOT NULL,
	DOB DATETIME NOT NULL,
	Phone VARCHAR(30) NULL,
	Email VARCHAR(200) NULL,
	Mobile VARCHAR(20) NULL,
	Website VARCHAR(100) NULL
);

Insert a test record.

INSERT INTO dbo.Contacts
(
	FirstName,
	LastName,
	DOB,
	Email,
	Website
)
VALUES
(
	'Rhys',
	'Campbell',
	'01-Jun-80',
	'noone@tempinbox.com',
	'http://www.youdidwhatwithtsql.com'
);

Create a view on this table.

CREATE VIEW vw_Contacts
AS
	SELECT *
	FROM dbo.Contacts;

Verify the view is functional

SELECT *
FROM vw_Contacts;

sql server view results

Now drop a column from the Contacts table.

ALTER TABLE dbo.Contacts DROP COLUMN Phone;

Now try selecting from the view again.

SELECT *
FROM vw_Contacts;

View or function 'vw_Contacts' has more column names specified than columns defined

So what’s going on here? The view is expecting the Phone column to still be in the Contact table. We created our view with an asterisk so this shouldn’t matter right? Uvavu! SQL Server stores metadata about the view when you create it. Changes to underlying tables can cause issues. Luckily the fix is easy.

EXEC sp_refreshview 'vw_Contacts';
SELECT *
FROM vw_Contacts;

fixed sql server view

This procedure will update the metadata stored by the view making it functional again. Presumably this is akin to dropping and recreating the view. This is a fairly trivial example but in a large system, with lots of views, and lots of developers, this could cause big headaches. I recommend you read the MSDN page for sp_refreshview. There’s a couple of useful scripts in the comments section for making light work of this.

Using .Net libraries in Powershell

One of the great things about Powershell is its ability to take advantage of the .Net platform. If Powershell can’t do it you bet there’s a .Net library that can. Not only is this great for extending your scripts but I also like to use Powershell to test some of my classes. Rather than fire up RAM hungry Visual Studio I can just script out a few tests in Powershell.

Here’s quick script you can modify to compile a .cs file to produce a library for Powershell to load. This gives me the flexibility to quickly test individual classes from a project. The class I was testing here is just a wrapper around the functions of DotNetZip. A reference is included, just remove this if your class doesn’t need it. At the end of the script the .dll file will be loaded and its methods will be listed by a call to Get-Member.

?View Code POWERSHELL
# Your c# class we want to compile as a library to test
$csharp_file = "`"C:\Users\Rhys\Documents\Visual Studio 2008\Projects\ETLBuddy\ETLBuddy\Zipper.cs`"";
# Path to csc compiler
$compiler = "$env:windir\Microsoft.NET\Framework\v3.5\csc.exe";
$cmd = "$compiler /target:library $csharp_file "
# Add any external library you're referencing, empty string if none
$reference="C:\Users\Rhys\Desktop\Ionic.Zip.dll";
 
# Add reference if needed
if($reference -ne "")
{
	$cmd += " /r:`"$reference`"";
}
 
# Invoke-Expression tells ps to evaluate the string as a command
# otherwise it would just print it out
Invoke-Expression $cmd;
 
# Now load the dll
$dll = Get-Location;
$dll = $dll.ToString();
$dll = $dll + "\" + $csharp_file.Substring($csharp_file.LastIndexOf("\") + 1) -replace(".cs", ".dll") -replace("`"", "");
 
[Reflection.Assembly]::LoadFile($dll -replace(".cs", ".dll"));
 
# Create a new object from the loaded dll
$obj = New-Object "com.etlbuddy.ETLBuddy.Zipper";
$obj | Get-Member;

From the output here you can see there are two methods called Zip and Unzip.

powershell .net library methods

Now I have a dll I can put to work in my Powershell Scripts! Once this is done it’s really quite simple to get started.

?View Code POWERSHELL
# Load the Zipper dll
# Needed to run gacutil /i "C:\Users\Rhys\Desktop\Ionic.Zip.dll" so this would work
[Reflection.Assembly]::LoadFile("C:\Users\Rhys\Documents\powershell\Zipper.dll") | Out-Null;
 
# Create a new zip object
$obj = New-Object "com.etlbuddy.ETLBuddy.Zipper";
 
# Zip a copy of a library
$obj.Zip("C:\Users\Rhys\Documents\powershell\Zipper.dll");

After execution I have a new zip file in my Powershell working directory.

file zipped by powershell

That’s really all there is to begin using an external library in your Powershell scripts. For further info checkout How to create an object in Powershell.