SQL 301 – ETL With SSIS, Part 2
In my last post on SSIS, I started you into developing an ETL solution with SSIS. I walked you through setting up your solution, renaming the package, and setting up the variables we would use during the rest of these lessons. Today, we’ll pick up with a Foreash Loop Container.
In most case, you’re going to deal with a vendor that will send you a file every day. You’ll want to be able to handle the situation where the vendor misses a day, and sends you two files the next day instead. If you want to handle that easily, then you set up your entire package inside a for each loop.
A for each loop does exactly what it says, it will do whatever you put inside the for each loop for each item in a collection it finds. You can do something for each file in a folder, for each record in a record set, each node in an XML element, or each object in a collection of Server Management Objects (SMO). In our case, we’re going to want to load each file it finds in a folder.
I would suggest giving the object a useful name, such as “Foreach file matching SearchFilePattern in SearchPath”.
Next, click over to the collection page.
On this page, we’re going to configure the for each loop object to search our SearchPath for the SearchFilePattern. To do that, we’re going to use Expressions. Expressions are a combinations of variables, functions and literals used together to create a single value. What they are will make more sense in a moment. Click the […] icon to the right of Expressions to define your first expression.
Once the Property Expressions Editor opens, you can click in the first row beneath property and choose Directory, since we’re going to define an expression to set where the for each loop should look.
On the line below that, choose FileSpec, we’re going to set that to SearchFilePattern.
Once you’ve got two lines, click the […] button to the right of Directory, and you’ll get the Expression Builder.
Pivot open the Variables folder in the left half of the Builder, then scroll down to the User::SearchPath entry. Click and drag it down to the Expression block. Double clicking an entry in the variables section does nothing.
While we’re here I’d like to point out we can do more than use variables. Like I mention before we can add a literal to the expression. Let’s say you didn’t set all your paths to end with a slash, but you were concatenating a path variable and a filename variable to make a file path… you’d have to add + “/” + between the two variables in order to add the slash you need between the path and the file name.
In our current example, we just need to assign the variable SearchPath to the directory expression for the for each loop. Repeat this process for the FileSpec expression. Set it to SearchFilePattern. Once you’ve done that, hit OK to return to the Foreach Loop Editor.
Even with the expressions defined, I enter the folder, and files values in the Foreach Loop Editor. That way you can test your code. I also code my solutions to expect the for each loop to pass back just the filename and extension, so change the radio button in the “Retrieve file name” section to “Name and extension”.
Simply click on the line under Variable, and choose User::FileName, and leave the index to zero. Since that’s the first value that will return from our for each loop, you don’t have to change that… if you want other variables set, then you’d go messing with that… but for now, leave it alone.
We’re now finished with the for each loop. Hit ok to continue.
The next thing we’re going to cover is a Script Task. In the toolbox, click and drag a Script task to the inside of the for each loop we created earlier.
Don’t ask me Why, but the editor should start with the General tab first. Click over to general and give your Script task a useful name. I’m choosing: “1-Set values for archive and error file paths”. I number my steps to make reading the log files a little easier.
Once you’ve named your script task, click back over to the Script tab. On this screen we need to set 3 pieces of information. First the scripting language. Since SSIS is built with .NET you can choose C# or Visual Basic. I’m going to leave this option to you. I’ll use VB in this lesson, but in future lessons you’ll see some C#.
Next, we want to tell the script what variables it can have access to. You can set variables as read only to the script, or read and write. Basically, will the script need to change values stored in those variables or not. In our example, User::DestinationPath, User::FileName, and User::SearchPath will not be changed in this script, but are required to compute the archive and error file paths. So we enter them as ReadOnlyVariables
That leaves us putting User::ArchivePathAndFileName and User::ErrorPathAndFileName as ReadWriteVariables.
Once we have that, click Edit Script to begin editing.
All of the editing you’re going to do in this script will be in the Main() function. If you’re not familiar with programming concepts, please let me know, and I’ll get you up to speed!
First we’re going to set the ConnectionString for our flat file source to the file found by the for each loop. To do that, we add the following line to the Main function.
All we do is take the SearchPath variable and Concatenate the FileName variable onto it. Yes, we could have captured the filename and path by the loop itself, but for now, practice some coding. When you learn to improve your SSIS Packages, feel free to do it either way!
If you’re debugging your code, you might want to know how to throw values to a dialog box. That way you can see what’s going on in the script. To throw the NewFile Connection string to a dialog box, include the following command to your script.
But remember, before releasing this to production, comment out that line by adding a single quote to the beginning of the line.
Next we’re going to set the ArchivePathAndFileName. This one is a bit more complex.
Again, we could have set this in an expression, but that’s the great thing about T-SQL and SSIS, there are always several ways to accomplish the same goal. In this case we want to take the DestinationPath variable, add on archive\, so it will put the files in the archive subfolder in the destination folder. In our example, the vendor reuses file names, and we don’t want our archive to overwrite older files, so we take the given file name, take of the name part, add the current date onto that name, then put the extension back on the end.
That way filename.txt gets saved as filename_0090112.txt in the archive folder.
We repeat the process for the error file path.
Ok, with that, if you want to debug, feel free. That way you can see the source filepath, and the destination paths. Your package won’t really do anything just yet. We’ll fix that soon enough. Next time, we’re going to cover the structure of our raw table, and set it up. We’ll talk about Truncating that table before each attempt to load the data from the text file to the raw table. Then we’ll talk about the Data Flow Task.
Until then, if you have any questions, please let me know! I’m here to help you understand SSIS better.