cover
titlepage

Table of Contents

Cover

Welcome to SISS

Section 1

Chapter 1: Moving Data with the Import and Export Wizard

Try It

Chapter 2: Installing SQL Server Integration Services

Chapter 3: Installing the Sample Databases

Try It

Chapter 4: Creating a Solution and Project

Try It

Chapter 5: Exploring SQL Server Data Tools

The Solution Explorer

Deployment Models

The Properties Window

The Toolbox

The SSDT Design Environment

Chapter 6: Creating Your First Package

Creating and Using Connection Managers

Using and Configuring Tasks

Exploring Package Encryption

Executing Packages

Try It

Chapter 7: Upgrading Packages to SQL Server 2012

Try It

Chapter 8: Upgrading to the Project Deployment Model

Try It

Section 2

Chapter 9: Using Precedence Constraints

Try It

Chapter 10: Manipulating Files with the File System Task

Try It

Chapter 11: Coding Custom Script Tasks

Try It

Chapter 12: Using the Execute SQL Task

Try It

Chapter 13: Using the Execute Process Task

Try It

Chapter 14: Using the Expression Task

Try It

Chapter 15: Using the Send Mail Task

Try It

Chapter 16: Using the FTP Task

Try It

Chapter 17: Creating a Data Flow

Try It

Section 3

Chapter 18: Extracting Data from Sources

Source Assistant

OLE DB Source

Excel Source

Flat File Source

Chapter 19: Loading Data to a Destination

Destination Assistant

OLE DB Destination

Flat File Destination

Excel Destination

Chapter 20: Changing Data Types with the Data Conversion Transform

Try It

Chapter 21: Creating and Replacing Columns with the Derived Column Transform

Try It

Chapter 22: Rolling Up Data with the Aggregate Transform

Try It

Chapter 23: Ordering Data with the Sort Transform

Try It

Chapter 24: Joining Data with the Lookup Transform

Cache Modes

The Cache Connection Manager and Transform

Chapter 25: Auditing Data with the Row Count Transform

Try It

Chapter 26: Combining Multiple Inputs with the Union All Transform

Try It

Chapter 27: Cleansing Data with the Script Component

Try It

Chapter 28: Separating Data with the Conditional Split Transform

Try It

Chapter 29: Altering Rows with the OLE DB Command Transform

Try It

Chapter 30: Handling Bad Data with the Fuzzy Lookup

Try It

Chapter 31: Removing Duplicates with the Fuzzy Grouping Transform

Try It

Section 4

Chapter 32: Making a Package Dynamic with Variables

Try It

Chapter 33: Making a Package Dynamic with Parameters

Try It

Chapter 34: Making a Connection Dynamic with Expressions

Try It

Chapter 35: Making a Task Dynamic with Expressions

Try It

Section 5

Chapter 36: Loading Data Incrementally

Try It

Chapter 37: Using the CDC Components in SSIS

CDC Control Task

CDC Source Task

CDC Splitter Task

Try It

Chapter 38: Using Data Quality Services

Try It

Chapter 39: Using the DQS Cleansing Transform

Try It

Chapter 40: Creating a Master Package

Try It

Section 6

Chapter 41: Using Sequence Containers to Organize a Package

Try It

Chapter 42: Using For Loop Containers to Repeat Control Flow Tasks

Try It

Chapter 43: Using the Foreach Loop Container to Loop Through a Collection of Objects

Try It

Section 7

Chapter 44: Easing Deployment with Configuration Tables

Try It

Final Deployment

Chapter 45: Easing Deployment with Configuration Files

Try It

Final Deployment

Chapter 46: Configuring Child Packages

Configuring an Execute Package Task

Configuring a Child Package

Try It

Section 8

Chapter 47: Logging Package Data

Try It

Chapter 48: Using Event Handlers

Creating Event Handlers

Common Uses for Event Handlers

Try It

Chapter 49: Troubleshooting Errors

Working in the Progress Tab

Troubleshooting Steps

Try It

Chapter 50: Using Data Viewers

Try It

Chapter 51: Using Breakpoints

Try It

Section 9

Chapter 52: Creating and Configuring the SSIS Catalog

Creating the Catalog

Configuring the Catalog

Creating and Using Folders

Try It

Chapter 53: Deploying Packages to the Package Catalog

Using the Deployment Wizard

Deploying Packages in the Package Deployment Model

Try It

Chapter 54: Configuring the Packages

Creating Environments

Configuring the Package

Try It

Chapter 55: Configuring the Service

Try It

Chapter 56: Securing SSIS Packages

Securing Packages in the Package Deployment Model

Securing Packages in the Project Deployment Model

Try It

Chapter 57: Running SSIS Packages

Executing Packages in the Package Deployment Model

Running Packages in the Project Deployment Model

Try It

Chapter 58: Running Packages in T-SQL and Debugging Packages

Running the Package

Debugging When Something Goes Wrong

Try It

Chapter 59: Scheduling Packages

Using Proxy Accounts

Try It

Section 10

Chapter 60: Dimension Load

Try It

Chapter 61: Fact Table Load

Try It

Section 11

Chapter 62: Bringing It All Together

Lesson Requirements

Hints

Step-by-Step

Appendix A: SSIS Component Crib Notes

When to Use Control Flow Tasks

When to Use Data Flow Transforms

Appendix B: Problem and Solution Crib Notes

Appendix C: What’s on the DVD?

System Requirements

Using the DVD

What’s on the DVD

Troubleshooting

Customer Care

Preface

John Wiley & Sons, Inc. End-User License Agreement

To the person who gave me my first chance at writing, Steve Wynkoop.—Brian Knight

Philippians 4:13—Devin Knight

To my wife Jessy, my son Gabriel, and my daughter Sydney; they are the reason I strive for more.—Mike Davis

Vickie is my wife’s name. She is patient, where I am impatient. She is at her best when things seem to be at their worst. In my experience, that is a rare trait. Her love, support, and understanding is what allows me, from a personal standpoint, to work on a project such as this. Most of my work on this project has been nights and weekends, in addition to a regular work week. For me, this is not possible without a good home life. Vickie—this is dedicated to you. —Wayne Snyder

About the Authors

ffirsg01.tif

Brian Knight, SQL Server MVP, MCITP, is the owner and founder of Pragmatic Works. He is the cofounder of BIDN.com, SQLServerCentral.com, and SQLShare.com. He runs the local SQL Server users group in Jacksonville (JSSUG). He is a contributing columnist at several technical magazines. He is the author of 15 SQL Server books. Brian has spoken at conferences like PASS, SQL Connections and TechEd, SQL Saturdays, Code Camps, and many pyramid scheme motivational sessions. His blog can be found at http://www.bidn.com, which covers many BI topics and miniature donkey training tips. Brian lives in Jacksonville, Florida, where he enjoys his kids and running marathons.

ffirsg02.tif

Devin Knight is a Senior BI consultant at Pragmatic Works Consulting. Previously, he has tech edited the book Professional Microsoft SQL Server 2008 Integration Services and was an author on the books Knight's 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services, Knight's Microsoft Business Intelligence 24-Hour Trainer, and SharePoint 2010 Business Intelligence 24-Hour Trainer. Devin has spoken at past conferences like PASS, SQL Saturdays, and Code Camps and is a contributing member to the PASS Business Intelligence Virtual Chapter. Making his home in Jacksonville, Florida, Devin is the Vice President of the local users’ group (JSSUG).

ffirsg03.tif

Mike Davis, MCTS, MCITP, is the Managing Project Lead at Pragmatic Works. This book is his fourth on the subject of business intelligence and specifically Integration Services. He has worked with SQL Server for almost a decade and has led many successful business intelligence projects with his clients. Mike is an experienced speaker and has presented at many events such as several SQL Server User Groups, Code Camps, SQL Saturday events, and the PASS Summit. Mike is an active member at his local user group (JSSUG) in Jacksonville, Florida. In his spare time, he likes to play darts and guitar. You can also find him on twitter @MikeDavisSQL, and his blog on MikeDavisSQL.com and BIDN.com.

ffirsg04.tif

Wayne Snyder has worked as a DBA for about 20 years, learning about databases and the data which they contain. For the past 8 years, he has been entirely focused on business intelligence, using the Microsoft BI Stack for Mariner (www.mariner-usa.com). His role at Mariner is Distinguished Architect, and in that role he spends a lot of time with Integration Services, Analysis Services, Reporting Services, and PowerPivot. There are hundreds of packages in production right now that he had a hand in making. He is a SQL Server MVP and a former President of PASS (Professional Association for SQL Server). When he is not working or writing, he plays the keyboard in a regional cover band, Soundbarrier (www.soundbarrierband.com).

ABOUT THE TECHNICAL EDITORS

Chris Albrektson is an experienced BI Consultant and Trainer currently at Pragmatic Works in Jacksonville, Florida. During his tenure at Pragmatic Works, he has designed and developed business intelligence solutions using the Microsoft Business Intelligence stack for a wide variety of customers across multiple industries. Previously, he has been a technical editor for the book Professional Microsoft SQL Server 2012 Reporting Services. Chris is an experienced speaker and has presented at many SQL Saturdays and Code Camps events across the United States. He’s also an active member of the Jacksonville SQL Server User Group (JSSUG), and is a regular blogger on BIDN.com.

Chris Price is a Senior Business Intelligence Consultant with Pragmatic Works based out of Lakeland, Florida. He has a B.S. degree in Management Information Systems and a Master’s of Business Administration, both from the University of South Florida. He began his career 12 years ago as a developer and has extensive experience across a wide range of Microsoft technologies. His current interests include ETL and Data Integration, Data Quality and Master Data Management, Analysis Services, SharePoint, and Big Data. Chris has spoken at 24 Hours of PASS and regularly presents at SQL Saturdays, Code Camps, and other community events. You can follow Chris on his blog at http://bidn.com/blogs/cpricc19e079/ or on Twitter at @BluewaterSQL.

Anthony Coleman is an experienced BI Consultant and Trainer for Pragmatic Works. Currently he designs, develops, and implements business intelligence solutions using the Microsoft BI stack. Anthony blogs at BIDN and contributes to the local SQL Server Users Group (JSSUG) in Jacksonville, Florida. In his free time, Anthony enjoys playing chess and poker.

Credits

Executive Editor

Robert Elliott

Senior Project Editor

Kevin Kent

Technical Editors

Chris Albrektson

Chris Price

Anthony Coleman

Production Editor

Christine Mugnolo

Copy Editor

Kimberly A. Cofer

Editorial Assistant

Rayna Erlick

Editorial Intern

Claire Johnson

Editorial Manager

Mary Beth Wakefield

Freelancer Editorial Manager

Rosemarie Graham

Associate Director of Marketing

David Mayhew

Marketing Manager

Ashley Zurcher

Business Manager

Amy Knies

Production Manager

Tim Tate

Vice President and Executive Group Publisher

Richard Swadley

Vice President and Executive Publisher

Neil Edde

Associate Publisher

Jim Minatel

Project Coordinator, Cover

Katie Crocker

compositor

Cody Gates, Happenstance Type-O-Rama

Proofreader

Gillian McGarvey, Word One New York

Indexer

Robert Swanson

Cover Designer

Elizabeth Brooks

Cover Image

Flying Colours Ltd / Getty Images

Vertical Websites Project Manager

Laura Moss-Hollister

Vertical Websites Supervising Producer

Rich Graves

Vertical Websites Quality Assurance

Doug Kuhn

Acknowledgments

Thanks to everyone who made this book possible. As always, I owe a huge debt to my wife Jenn for putting up with my late nights and my children, Colton, Liam, Camille, and John for being so patience with their tired dad who has always overextended. Thanks to Kevin Kent and my tech editors Chris Albrektson, Chris Price, and Anthony Coleman for keeping me in my place. Thanks also to the makers of Guinness for providing my special juice that helped me power through the book. Thanks for all the user group leaders out there who work so hard to help others become proficient in technology. You make a huge difference! Finally, thanks to my professional yodeling coach, Helga Felenstein, for getting me ready for my debut this fall. —Brian Knight

I must give thanks to God, who without in my life, I would not have such blessings. Thanks to my wife Erin who has had amazing patience during the late nights of writing, editing, and video recording. To our three children, Collin, Justin, and Lana, who have sacrificed time away from daddy. Thanks to the group of writers Brian, Mike, and Wayne, who all worked very hard while missing time with their families, too. Finally, I would like to thank my jousting mentor, Shane Adams, for showing me the way to become a real knight. Competitive jousting has always been a dream of mine, and I look forward to competing at the Liverpool Renaissance Fair. —Devin Knight

Thanks to my Pragmatic Works Team for their support in this book. Thank you to Brian Knight for giving me the opportunity of a lifetime. Thank you to Adam Jorgensen for growing me. Thank you to the Wiley team, especially Kevin and Bob. Thank you to the technical editors for their help in making this book great. Thank you to my mother for raising me to be the man I am today. Thank you to my wife and kids for being by my side. And finally, thank you to the Flying Spaghetti Monster for his noodlely blessings, ramen. —Mike Davis

This book is the culmination of the work of many people, smart people, all who have worked very hard. To Kevin Kent, the senior project editor — you have been great to work with. Kim Cofer, the copy editor, who has taken my sloppy, southern version of English and made my chapters sound intelligent. And to Chris Albrektson, Chris Price, and Anthony Coleman, whose eagle eyes have enabled the work to actually be intelligent and technically accurate. Thank you all so much. Working with you all on this book has been a great pleasure!

To the reader — Do not be afraid of SSIS. You can learn this and be successful. This book will help you get started. Do not simply download the completed packages and look through them. Go through each Try It yourself. Do not let your brain go into auto-pilot mode. Engage your brain and think about each step. As you develop your skills, you will become very comfortable with the tool. You will be able to solve difficult ETL problems using SSIS. With the combination of Integration Services and your hard work, great things can happen for you, your company, and your customers. —Wayne Snyder

Preface

If you’ve picked up this book, Knight’s Microsoft SQL Server 2012 Integration Services 24-Hour Trainer, you’ve decided to learn one of SQL Server’s most exciting applications, SQL Server Integration Services (SSIS). SSIS is a platform to move data from nearly any data source to nearly any destination and helps you by orchestrating a workflow to organize and control the execution of all these events. Most who dive into SSIS use it weekly, if not daily, to move data between partners, departments, or customers. It’s also a highly in-demand skill—even in the worst of economic environments, jobs are still posted for SSIS developers. This is because no matter what happens in an economy, people still must move and transform data.

This book, then, is your chance to start delving into this powerful and marketable application. And what’s more, this is not just a book you’re holding right now. It’s a video learning tool, as well. We became passionate about video training a number of years ago when we realized that in our own learning we required exposure to multiple teaching techniques to truly understand a topic—a fact that is especially true with tutorial books like this one. So, you’ll find hours of videos on the DVD in this book to help you learn SSIS better than reading about the topic alone could and to help demonstrate the various tutorials in the book.

Who This Book Is For

This is a beginner book and assumes only that you know SQL Server 2012 to run queries against the database engine (T-SQL skills are assumed and used throughout this book). Because this book is structured for a beginner, providing many tutorials and teaching you only what you’ll likely use at work, it is not a reference book filled with a description of every property in a given task. It instead focuses on only the essential components for you to complete your project at work or school.

What This Book Covers

This book covers SQL Server 2012 and assumes no knowledge of previous versions of SQL Server. The differences between SQL Server 2005/2008 and SQL Server 2012 mostly exist around the administration of SSIS, and there are a few new components. By the time you’ve completed this book, you’ll know how to load and synchronize database systems using SSIS by using some of the new SQL Server 2012 features. You’ll also know how to load data warehouses, which is a very hot and specialized skill. Even in warehousing, you’ll find features in the new SQL Server 2012 release that you’ll wonder how you lived without, like Change Data Capture (CDC)!

How This Book Is Structured

Our main principle in this book is to teach you only what we think you need to perform your job task. Because of that, it’s not a comprehensive reference book. You won’t find a description of every feature of SSIS in here. Instead the book blends small amounts of description, a tutorial, and videos to enhance your experience. Each lesson walks you through how to use components of SSIS and contains a tutorial. In this tutorial, called “Try It,” you can choose to read the requirements to complete the lesson, the hints of how to go about it, and begin coding, or you can read the step-by-step instructions if you learn better that way. Either way if you get stuck or want to see how one of us does the solution, watch the video on the DVD to receive further instruction.

What This Book Covers

This book contains 62 lessons, which are broken into 11 sections. The lessons are usually only a few pages long and focus on the smallest unit of work in SSIS that we could work on. Each section has a large theme around a given section in SSIS:

  • Section 1: Installation and Getting Started—This section covers why you would use SSIS and the basic installation of SSIS and the sample databases that you’ll use throughout this book. If you already have SSIS and the sample databases installed, you can review this section quickly.
  • Section 2: Control Flow—This section explains how to use tasks in the Control Flow of SSIS.
  • Section 3: Data Flow—Seventy-five percent of your time as an SSIS developer is spent in the Data Flow tab. This section focuses on the configuration of the core sources, transforms, and destinations.
  • Section 4: Making Packages Dynamic—Now that you’ve created your first package, you must make it dynamic. This section covers how you can use variables, parameters, and expressions to make your package change at run time.
  • Section 5: Common ETL Scenarios—In an effort to show you some real-world business scenarios, this section covers some of the common ETL scenarios like performing incremental loads and using SQL Server’s newest component, Data Quality Services (DQS), with SSIS.
  • Section 6: Containers—This section covers one of the key Control Flow items, containers, which control how SSIS does looping and grouping.
  • Section 7: Configuring Packages—Here you learn how to configure your packages externally through configuration files, tables, and other ways.
  • Section 8: Troubleshooting SSIS—No sooner do you have an SSIS package developed than you start experiencing problems. This section shows you how to troubleshoot these problems.
  • Section 9: Administering SSIS—Now that your package is developed, here you learn how to deploy and configure the service.
  • Section 10: Loading a Warehouse—A little more on the advanced side, this section teaches you how to load a data warehouse using SSIS.
  • Section 11: Wrap Up and Review—This section was one of our favorites to write. It contains a lesson to bring everything together and also Appendices A and B, which contain crib notes for quick reference. As trainers and consultants, we are constantly asked to leave behind a quick page of crib notes of common code. In these appendices, you find guides on when to use which SSIS components and useful solutions and code snippets that address common situations you might face.

Instructional Videos on DVD

As mentioned earlier in this preface, because we believe strongly in the value of video training, this book has an accompanying DVD containing hours of instructional video. At the end of each lesson in the book, you will find a reference to an instructional video on the DVD that accompanies that lesson. In that video, one of us will walk you through the content and examples contained in that lesson. So, if seeing something done and hearing it explained helps you understand a subject better than just reading about it does, this book and DVD combination is just the thing for you to get started with SSIS. You can also find the instructional videos available for viewing online atwww.wrox.com/go/ssis2012video.

Conventions

To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.


WARNING Boxes like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.


NOTE Notes, tips, hints, tricks, and asides to the current discussion are offset and placed in italics like this.


References like this one point you to the DVD to watch the instructional video that accompanies a given lesson.

As for styles in the text:

  • We highlight new terms and important words when we introduce them.
  • We show URLs and code within the text like so: persistence.properties.
  • We present code in the following way:
    We use a monofont type for code examples.

Supporting Packages and Code

As you work through the lessons in this book, you may choose either to type in all the code and create all the packages manually or to use the supporting packages and code files that accompany the book. All the packages, code, and other support files used in this book are available for download at www.wrox.com. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book.


NOTE Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is 978-1-118-47958-2.

Once you download the code, just decompress it with your favorite compression tool. Alternatively, you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.

You will need two sample databases for the tutorial, both provided by Microsoft for use with SQL Server: AdventureWorks2012 and AdventureWorksDW2012. The two sample databases are not installed by default with SQL Server 2012. You can download versions of the sample databases used for this book at the Wrox website at www.wrox.com/go/SQLSever2012DataSets. Lesson 3 also covers how to install and configure the databases.

Errata

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata, you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.

To find the errata page for this book, go to www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the Book Search Results page, click the Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors.


NOTE A complete book list including links to errata is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don’t spot “your” error on the Errata page, click the Errata Form link and complete the form to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.

p2p.wrox.com

For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a Web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.

At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:

1. Go to p2p.wrox.com and click the Register link.
2. Read the terms of use and click Agree.
3. Complete the required information to join as well as any optional information you wish to provide and click Submit.
4. You will receive an e-mail with information describing how to verify your account and complete the joining process.

NOTE You can read messages in the forums without joining P2P but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the Web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.

Welcome to SSIS

SQL Server Integration Services (SSIS) is one of the most powerful applications in your arsenal for moving data in and out of various databases and files. Like the rest of the business intelligence (BI) suite that comes with SQL Server, SSIS is already included in your SQL Server license when you pay for the Standard, BI, or Enterprise editions of SQL Server. Even though SSIS is included in SQL Server, you don’t even need to have SQL Server installed to make it function. Because of that, even if your environment is not using a lot of SQL Server, you can still use SSIS as a platform for data movement.

Though ultimately this book is more interactive in nature, this introduction first walks you through a high-level tour of SSIS so you have a life preserver on prior to jumping in the pool. Each topic touched on in this introduction is covered in much more depth throughout the book in lesson form and in the supporting videos on the DVD.

Import and Export Wizard

If you need to move data quickly from almost any data source to a destination, you can use the SSIS Import and Export Wizard (shown in Figure 1). The wizard is a quick way to move the data and perform very light transformations of data, such as casting of the data into new data types. You can quickly check any table you want to transfer, as well as write a query against the data to retrieve only a selective amount of data.

Figure 1

c00f001.tif

SQL Server Data Tools

SQL Server Data Tools (SSDT) is the central tool that you’ll spend most of your time in as an SSIS developer (really as a SQL Server developer). Like the rest of SQL Server, the tool’s foundation is the Visual Studio 2010 interface (shown in Figure 2), and SSDT is installed when you install SQL Server 2012. The nicest thing about the tool is that it’s not bound to any particular SQL Server. In other words, you won’t have to connect to a SQL Server to design an SSIS package. You can design the package disconnected from your SQL Server environment and then deploy it to your target SQL Server or the filesystem on which you’d like it to run.

Architecture

Although SSIS has been a major extraction, transformation, and loading (ETL) platform for several releases of SQL Server, SQL Server 2012 has simplified the platform for developers and administrators. Because of its scalability and lower cost, SSIS is also a major player in the ETL market. What’s especially nice about SSIS is its price tag, which is free with the purchase of SQL Server. Other ETL tools can cost hundreds of thousands of dollars based on how you scale the software.

Figure 2

c00f002.tif

The SSIS architecture consists of five main components:

  • The SSIS service (there for legacy SSIS packages)
  • The SSIS runtime engine and the runtime executables
  • The SSIS catalog
  • The SSIS Data Flow engine and the Data Flow components
  • The SSIS clients

Let’s boil this down to the essentials that you need to know to do your job. The SSIS service (for packages running in legacy mode) and now the SSIS catalog handle the operational aspects of SSIS. The service is a Windows service that is installed when you install the SSIS component of SQL Server 2012, and it tracks the execution of packages (a collection of work items) and helps with the storage of the packages. You don’t need the SSIS service to run SSIS packages, but if the service is stopped, all the SSIS packages that are currently running will, in turn, stop by default.

This service is mainly used for packages stored in the older style of storing packages, the package deployment model. The new model, the project deployment model, uses something called the package catalog. The catalog is the newer way of storing packages that gives you many new options, like running packages with T-SQL. The catalog also stores basic operational information about your package.

The SSIS runtime engine and its complimentary programs actually run your SSIS packages. The engine saves the layout of your packages and manages the logging, debugging, configuration, connections, and transactions. Additionally, it manages handling your events to send you e-mails or log in to a database when an event is raised in your package. The runtime executables provide the following functionality to a package; these are discussed in more detail throughout this book:

  • Containers—Provide structure and scope to your package
  • Tasks—Provide the functionality to your package
  • Event handlers—Respond to raised events in your package
  • Precedence constraints—Provide an ordinal relationship between various items in your package

Packages

A core component of SSIS is the notion of a package. A package best parallels an executable program in Windows. Essentially, a package is a collection of tasks that execute in an orderly fashion. Precedence constraints help manage the order in which the tasks will execute. A package can be saved onto a SQL Server, which in actuality is saved in the msdb or package catalog database. It can also be saved as a .dtsx file, which is an XML structured file much like .rdl files are to Reporting Services. The end result of the package looks like what’s displayed in Figure 2, which was shown earlier.

Tasks

A task can best be described as an individual unit of work. Tasks provide functionality to your package, much like a method does in a programming language. A task can move a file, load a file into a database, send an e-mail, or write a set of .NET code for you, to name just a few of the things it can do. A small subset of the common tasks available to you comprises the following:

  • Bulk Insert Task—Loads data into a table by using the BULK INSERT SQL command.
  • Data Flow Task—This is the most important task that loads and transforms data into an OLE DB Destination.
  • Execute Package Task—Enables you to execute a package from within a package, making your SSIS packages modular.
  • Execute Process Task—Executes a program external to your package, like one to split your extract file into many files before processing the individual files.
  • Execute SQL Task—Executes a SQL statement or stored procedure.
  • File System Task—This task can handle directory operations like creating, renaming, or deleting a directory. It can also manage file operations like moving, copying, or deleting files.
  • FTP Task—Sends or receives files from an FTP site.
  • Script Task—Runs a set of VB.NET or C# coding inside a Visual Studio environment.
  • Send Mail Task—Sends a mail message through SMTP.
  • Analysis Services Processing Task—This task processes a SQL Server Analysis Services cube, dimension, or mining model.
  • Web Service Task—Executes a method on a web service.
  • WMI Data Reader Task—This task can run WQL queries against the Windows Management Instrumentation (WMI). This enables you to read the event log, get a list of applications that are installed, or determine hardware that is installed, to name a few examples.
  • WMI Event Watcher Task—This task empowers SSIS to wait for and respond to certain WMI events that occur in the operating system.
  • XML Task—Parses or processes an XML file. It can merge, split, or reformat an XML file.

These are only a few of the many tasks you have available to you. You can also write your own task or download a task from the web that does something else. Writing such a task only requires that you learn the SSIS object model and know VB.NET or C#. You can also use the Script Task to do things that the native tasks can’t do.

Data Flow Elements

Once you create a Data Flow Task, the Data Flow tab in SSDT is available to you for design. Just as the Control Flow tab handles the main workflow of the package, the Data Flow tab handles the transformation of data. Every package has a single Control Flow, but can have many Data Flows. Almost anything that manipulates data falls into the Data Flow category. You can see an example of a Data Flow in Figure 3, where data is pulled from an OLE DB Source and transformed before being written to a Flat File Destination. As data moves through each step of the Data Flow, the data changes based on what the transform does. For example, in Figure 3, a new column is derived using the Derived Column Transform and that new column is then available to subsequent transformations or to the destination.

Figure 3

c00f003.tif

You can add multiple Data Flow Tasks onto the Control Flow tab. You’ll notice that after you click on each one, it jumps to the Data Flow tab with the Data Flow Task name you selected in the drop-down box right under the tab. You can toggle between Data Flow Tasks easily by selecting the next Data Flow Task from that drop-down box.

Sources

A source is where you specify the location of your source data to pull from in the data flow. Sources will generally point to a connection manager in SSIS. By pointing them to the connection manager, you can reuse connections throughout your package because you need only change the connection in one place. Here are some of the common sources you’ll be using in SSIS:

  • OLE DB Source—Connects to nearly any OLE DB Data Source like SQL Server, Access, Oracle, or DB2, to name just a few.
  • Excel Source—Source that specializes in receiving data from Excel spreadsheets. This source also makes it easy to run SQL queries against your Excel spreadsheet to narrow the scope of the data that you want to pass through the flow.
  • Flat File Source—Connects to a delimited or fixed-width file.
  • XML Source—Can retrieve data from an XML document.
  • ODBC Source—The ODBC Source enables you to connect to common data sources that don’t use OLE DB.

Destinations

Inside the Data Flow, destinations accept the data from the data sources and from the transformations. The flexible architecture can send the data to nearly any OLE DB–compliant data source or to a flat file. Like sources, destinations are managed through the connection manager. Some of the more common destinations in SSIS and available to you are as follows:

  • Excel Destination—Outputs data from the Data Flow to an Excel spreadsheet that must already exist.
  • Flat File Destination—Enables you to write data to a comma-delimited or fixed-width file.
  • OLE DB Destination—Outputs data to an OLE DB data connection like SQL Server, Oracle, or Access.
  • SQL Server Destination—The destination that you use to write data to SQL Server most efficiently. To use this, you must run the package from the destination.

Transformations

Transformations (or transforms) are a key component to the Data Flow that change the data to a format that you’d like. For example, you may want your data to be sorted and aggregated. Two transformations can accomplish this task for you. The nicest thing about transformations in SSIS is they are all done in-memory, and because of this they are extremely efficient. Memory handles data manipulation much faster than disk IO does, and you’ll find if disk paging occurs, your package that ran in 20 minutes will suddenly take hours. Here are some of the more common transforms you’ll use on a regular basis:

  • Aggregate—Aggregates data from a transform or source similar to a GROUP BY statement in T-SQL.
  • Conditional Split—Splits the data based on certain conditions being met. For example, if the State column is equal to Florida, send the data down a different path. This transform is similar to a CASE statement in T-SQL.
  • Data Conversion—Converts a column’s data type to another data type. This transform is similar to a CAST statement in T-SQL.
  • Derived Column—Performs an in-line update to the data or creates a new column from a formula. For example, you can use this to calculate a Profit column based on a Cost and SellPrice set of columns.
  • Fuzzy Grouping—Performs data cleansing by finding rows that are likely duplicates.
  • Fuzzy Lookup—Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John.
  • Lookup—Performs a lookup on data to be used later in a transformation. For example, you can use this transformation to look up a city based on the ZIP code.
  • Multicast—Sends a copy of the data to an additional path in the workflow and can be used to parallelize data. For example, you may want to send the same set of records to two tables.
  • OLE DB Command—Executes an OLE DB command for each row in the Data Flow. Can be used to run an UPDATE or DELETE statement inside the Data Flow.
  • Row Count—Stores the row count from the Data Flow into a variable for later use by, perhaps, an auditing solution.
  • Script Component—Uses a script to transform the data. For example, you can use this to apply specialized business logic to your Data Flow.
  • Slowly Changing Dimension—Coordinates the conditional insert or update of data in a slowly changing dimension during a data warehouse load.
  • Sort—Sorts the data in the Data Flow by a given column and removes exact duplicates.
  • Union All—Merges multiple data sets into a single data set.
  • Unpivot—Unpivots the data from a non-normalized format to a relational format.

SSIS Capabilities Available in Editions of SQL Server 2012

The features in SSIS and SQL Server that are available to you vary widely based on what edition of SQL Server you’re using. As you can imagine, the higher-end edition of SQL Server you purchase, the more features are available. As for SSIS, you’ll have to use at least the Standard Edition to receive the bulk of the SSIS features. In the Express and Workgroup Editions, only the Import and Export Wizard is available to you. You’ll have to upgrade to the Enterprise or Developer Editions to see some features in SSIS. The advanced transformations available only with the Enterprise Edition are as follows:

  • Data Mining Query Transformation
  • Fuzzy Lookup and Fuzzy Grouping Transformations
  • Term Extraction and Term Lookup Transformations
  • Data Mining Model Training Destination
  • Dimension Processing Destination
  • Partition Processing Destination
  • Change Data Capture components
  • Higher speed data connectivity components such as connectivity to SAP or Oracle

Summary

This introduction exposed you to the SQL Server Integration Services (SSIS) architecture and some of the different elements you’ll be dealing with in SSIS. Tasks are individual units of work that are chained together with precedence constraints. Packages are executable programs in SSIS that are a collection of tasks. Finally, transformations are the Data Flow items that change the data to the form you request, such as sorting the data the way you want. Now that the overview is out of the way, it’s time to start the first section and your first set of lessons, and time for you to get your hands on SSIS.


As mentioned earlier, the print book comes with an accompanying DVD containing hours of instructional supporting video. At the end of each lesson in the book, you will find a box like this one pointing you to a video on the DVD that accompanies that lesson. In that video, one of us will walk you through the content and examples contained in that lesson. So, if seeing something done and hearing it explained helps you understand a subject better than just reading about it does, this text and video combination provides exactly what you need. There’s even an Introduction to SSIS video that you can watch to get started. Simply select the Intro to SSIS lesson on the DVD. You can also view the instructional videos online at www.wrox.com/go/ssis2012video.

Section 1

Installation and Getting Started

  • Lesson 1: Moving Data with the Import and Export Wizard
  • Lesson 2: Installing SQL Server Integration Services
  • Lesson 3: Installing the Sample Databases
  • Lesson 4: Creating a Solution and Project
  • Lesson 5: Exploring SQL Server Data Tools
  • Lesson 6: Creating Your First Package
  • Lesson 7: Upgrading Packages to SQL Server 2012
  • Lesson 8: Upgrading to the Project Deployment Model

Section 1

Installation and Getting Started

  • Lesson 1: Moving Data with the Import and Export Wizard
  • Lesson 2: Installing SQL Server Integration Services
  • Lesson 3: Installing the Sample Databases
  • Lesson 4: Creating a Solution and Project
  • Lesson 5: Exploring SQL Server Data Tools
  • Lesson 6: Creating Your First Package
  • Lesson 7: Upgrading Packages to SQL Server 2012
  • Lesson 8: Upgrading to the Project Deployment Model

Chapter 1

Moving Data with the Import and Export Wizard

The Import and Export Wizard is the easiest method to move data from sources like Excel, Oracle, DB2, SQL Server, and text files to nearly any destination. This wizard uses SSIS as a framework and can optionally save a package as its output prior to executing. The package it produces will not be the most elegant, but it can take a lot of the grunt work out of package development and provide the building blocks that are necessary for you to build the remainder of the package. Oftentimes as an SSIS developer, you’ll want to relegate the grunt work and heavy lifting to the wizard and do the more complex coding yourself. The wizard does no transformations or cleansing, but instead only moves data from point A to point B.

As with most SQL Server wizards, you have numerous ways to open the tool:

  • To open the Import and Export Wizard, right-click the database you want to import data from or export data to SQL Server Management Studio and select Tasks > Import Data (or Export Data based on what task you’re performing).
  • You can also open the wizard by right-clicking SSIS Packages in SQL Server Data Tools (SSDT) and selecting SSIS Import and Export Wizard.
  • Another common way to open it is from the Start menu under SQL Server 2012 by choosing Import and Export Data.
  • The last way to open the wizard is by typing dtswizard.exe at the command line or Run prompt.

Regardless of whether you need to import or export the data, the first few screens in the wizard look very similar.

Once the wizard comes up, you see the typical Microsoft wizard welcome screen. Click Next to begin specifying the source connection. If you opened the wizard from Management Studio by selecting Export Data, this screen is prepopulated. In this screen, you specify where your data is coming from in the Source drop-down box. Once you select the source, the rest of the options on the dialog box may vary based on the type of connection. The default source is SQL Native Client, and it looks like Figure 1-1. You have OLE DB Sources like SQL Server, Oracle, and Access available out of the box. You can also use text files and Excel files. After selecting the source, you have to fill in the provider-specific information.

Figure 1-1

c01f001.tif

For SQL Server, you must enter the server name (localhost means go to your local machine’s SQL Server instance, if applicable) and the username and password you want to use. If you’re going to connect with your Windows account, simply select Use Windows Authentication. Windows Authentication will pass your Windows local or domain credentials into the data source. Lastly, choose a database that you’d like to connect to. For most of the examples in this book, you use the AdventureWorks2012 database. You can see Lesson 3 of this book for more information on installing this sample database.


NOTE You can find the sample databases used for this book at the Wrox website at www.wrox.com/go/SQLSever2012DataSets.


NOTE Additional sources such as Sybase and DB2 can also become available if you install the vendors’ OLE DB or ODBC providers. You can download additional providers for free if you’re using Enterprise Edition by going to the SQL Server 2012 Feature Pack on the Microsoft website. You also have ODBC and ADO.NET providers available to you in SQL Server 2012.