SQL 301 – ETL With SSIS, Part 1
I’ve discussed SSIS on this site before, but I’ve heard the feedback, you want real world examples. So, I decided to use the business case I address most often using SSIS, and that’s Exchange Transformation and Load.
Let’s dive right in to this problem. You receive a file via FTP from a vendor every day. You need to import this data into your database. The data file is a comma delimited text file. This process has become so common I have a design template I follow when developing this solution.
I search the source folder for files matching a certain naming scheme, I then import the files into a “raw” table. A raw table is simply a table with the right number of columns, yet all the data types are VARCHAR. That way if they pass a bad data type, I can test for that, and fail an individual record, rather than failing the whole file. Once the data is in the raw file, I then scrub the data. I’m looking for bad data types, or inconsistent data. Once I’ve handled all the problems, then I import the data into the “production” table, or a table that will be used by other processes.
Right. Let’s get started
Open Visual Studio (in this example I’ll be using Visual Studio 2008), and hit File >New >Project. Choose the project template for Integration Services Project. Give your project a name, in my example I’ve named it GenericFileImport. You can create this project wherever you like. Hit OK on the new project wizard to continue.
When Visual Studio loads, you should see a pretty blank screen. This is your canvas.
Right click on package.dtsx in the solution explorer. By default it’s on the right. Choose rename. I chose to call mine GenericFileImport.dtsx.
When you rename your package, you will get a dialog box asking if you want to rename the package object as well. Trust me, you want to hit yes. Information inside the package won’t match, and you’ll have issues further down the road. So hit yes on that dialog box.
Now that we have that, let’s start with the variables. There are some parameters we’re going to want to set when we run this package.
- We want to be able to change the path where we expect to find the files. Right now we get this via ftp, what if we wanted to get it from a file share instead?
- We want to be able to change the file pattern, imagine the vendor wants to change from a txt file to a csv file.
- We want to be able to set our archival location. Right now we may store these on one server, but what if we have a new file server added to our network later?
In order to do this easily, we will need to use variables. In order to add variables, hit View > Other Windows > Variables. A panel should slide out from the right (by default). This is your Variables viewer. Click the first icon, and you’ll be able to add a variable.
First we’re going to add our SearchPath variable to store the folder we want to search for our file. We enter the name and leave the scope as GenericFileImport (This makes the variable global. In a later post I’ll cover the various scopes and when you should use each. For now, let’s leave this default.) Next we change the data type to string, and for a value, let’s have that point to a test folder.
In the project folder I’ll create a sub folder called “sourcefolder”. I’ll then use the full path to that folder as my test. I would like to take an moment to mention a common mistake. When you’re dealing with paths, you have to make a call, am I always going to end with a \ or not. In this example, paths will always end with a \. If you have problems getting pathing to work in your packages, I suggest you check to see which you’ve chosen, to include the slash or not.
Ok, while we’re here, we’ll go ahead and set up a few more variables.
- DestinationPath, a string. I would suggest setting up a test folder in your project to stand in as your destination. I called mine destinationfolder.
- ArchivePathAndFileName, a string. We’ll set this in a scripting object later, you don’t have to fill this in yet. But you will want to put an archive sub folder in your destination folder.
- ErrorPathAndFileName, a string. Again, we’ll set the value in a scripting object. Again, I’d create a folder under your destination folder called error to that will hold your error files.
- FileName, a string. This will be the name of the file we find in the search folder. You can leave this blank for now.
- SearchFilePattern, a string. This is the pattern that will be used by the package to find the source file. In this case we’re going to look for a .txt file. Go ahead and download this file now, and save it to the source folder you created earlier.
Once you’ve set up all those variables, you’re ready to continue. Take a look at the following screen cap to make sure we’re on the same page.
Before we end for today, we need to set up data connections. Without them, this whole process couldn’t work. We’re going to set up two connections, one to our source data file, and the other to our database server.
To set up connections, look at the bottom of Visual Studio.
Yeah, it really is that easy. Right click in that area, and choose New Flat File Connection. You’ll need to fill in a Connection ManagerName, I chose NewFile. Then browse to the demo file you’re using to test your import. Luckily for this demo, all the default options on this screen can be left alone. Our demo file is a delimited file. It isn’t using any text qualifiers, like quotes. We don’t want to skip the first row. The only option we have to change on this first screen is check the box “Column names in the first data row”.
Check the image to the left to compare how you set up your file connection.
You will want to click the columns tab on the left, in order to proof your file connection. Verify the columns appear correctly. In other words, does it look like a spreadsheet, or is the text all run together?
If it looks like a spreadsheet, you’re finished with this connection. If it doesn’t, please reach out to me, and we’ll work through your issue.
Hit Ok to save this connection.
Next we’re going to Set up a connection to our database server. In my case I’m connecting to the training server. You’ll need a connection to a database server where you have a copy of AdventureWorks installed.
To set up a sql connection, right click in the connection manager section of Visual Studio and choose “New OleDB Connection” The “Configure OLE DB Connection Manager” will open. If you haven’t set up a connection before, then you’ll have to define your connection now. Hit New to start a new connection manager.
You’ll need to enter the server name, and your authentication method. Then choose the Database AdventureWorks. Be sure to click “Test Connection.” That way you can verify you can connect to the server.
Once your connection is verified, click ok, then ok. you now have two connections in your package, one for your flat file source, the other for your database destination. It’s just a decorative thing, but I usually right click and rename my database connections to just the database name. Since the server name and credentials are almost always variable… there’s no need to save that information to the connection name.
Let’s stop here for now. Next time we’ll pick up with a for each loop and a scripting task. With these we’ll be able to handle multiple files loaded to our SearchPath, and for each file we’ll be able to set the FileName, ArchivePathAndFileName and ErrorPathAndFileName. Necessary steps we have to take in order to import this file into our database.
If you have any questions so far, please, let me know! I’m here to help.