Accelerating ETL Data Retrieval

As I’ve mentioned previously, SQL Server 2012 Integration Services Design Patterns is now available on Amazon. PASS has invited the ENTIRE author team to give a pre-conference training session at Summit 2012. Precons are all-day training events and are held on the days before Summit. Our precon will be held on Monday, November 5th. I’ve attended several precons at Summit, and in my opinion, they provide unparalleled depth and access to the presenters. I expect our precon will be no different.

So what will I be presenting on? Good question. I’m going to talk about Accelerating ETL Data Retrieval. Specifically, I’ll discuss strategies for retrieving data for full loads vs. incremental loads, and I’ll explore some strategies for retrieving data from large tables. To clarify “large,” since everyone has a different definition, I’ve successfully employed these strategies on tables with 10 billion rows.

Now you might be reading this and thinking, “Hey, that doesn’t sound like SSIS!” Well… yes and no. While I will be discussing some strategies within SSIS, most of what I discuss will take place at the database layer. In fact, most of my content could be useful for any ETL tool. I still think it’s a good topic for this precon for three reasons. First, most ETL developers I talk with — that is, people who specialize in and are primarily tasked with ETL — overlook the data retrieval step. I frequently hear, “I’m pulling all of the data, so it doesn’t matter, I can’t get the data any faster than SELECT * FROM.” That’s not always true, and I’ll explain why in the precon. Secondly, having a thorough understanding of data retrieval techniques is important. Data retrieval is one of the most — if not the single most — expensive components of ETL. Lastly, I think there’s a huge knowledge gap in this area. While there is a lot of discussion about query optimization, it rarely has to do with the type of query patterns that ETL requires.

So that’s what I’ll be talking about and why. What do you think? Worthwhile content?

Also, this is just one of the things that will be covered in our SSIS Design Patterns precon. The rest of the author team — Andy Leonard, Matt Masson, Tim Mitchell, and Jessica Moss — is putting together great content that includes package execution, parent/child relationships, expressions and scripting, dynamic configurations, error handling, data flow internals, scalability and parallelism, automation, dynamic package generation, data warehousing patterns, and more!

As an aside, I am honored to announce that I have been re-awarded as Microsoft MVP for SQL Server. My husband took this picture of my trophy for me, which I think turned out really cool. :)

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , , , , , , , . Bookmark the permalink.

One Response to Accelerating ETL Data Retrieval

  1. Dave Clark says:

    Hi Michelle,
    Would you be able to write a post about any of the concepts that you covered in regards to “Accelerating ETL Data Retrieval”?
    Or is there additional information available on this or is it covered in the book that mention?
    Thanks,
    Dave

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>