How To Estimate Data Utilization

Recently, on a conference call presenting data growth rates and database capacity projections, I had a top-line executive ask, “But how much of that data are we actually using today?” The question was met with silence; unless you have rigorous auditing in place — and kudos to you if you do — it’s a difficult question to answer. But it begs the question, is there some way to gleam this information from SQL Server? I think the answer is “yes,” if you make some assumptions and understand what you’re looking at.

SQL Server collects stats about every time an index is used and how it is used (i.e. whether a user seeked or scanned the index, etc.). It also provides a DMV to view these stats: sys.dm_db_index_usage_stats.

This DMV provides a wealth of great information, but to answer our question of “What data is actually being used?”, we have to refine our criteria. Are we talking in terms of table counts or data size? I’d argue that data size is more important than table counts; one unqueried millow-row table is more wasteful than a hundred ten-row tables.

Also, are we looking at indexes or content? From a database perspective, I’m more interested in indexes: how much space are we wasting on unused indexes? To identify this, I need to look at the activity on each individual index.

From a business perspective, I would be more interested in content (i.e. tables): how much business information is being stored that no one is even looking at? To answer this question, I need to roll up all index usage to see if *any* of the indexes on a table were used. Since both were of interest to me, I decided to write queries to answer both questions.

Lastly, we need to understand the flaws with this data. Chiefly, I cannot tell whether a user requested one row from a million-row table, or if [s]he needed all of the data in the table. This is a pretty important issue, especially with large historical data stores, and it’s where I have to make the biggest assumption: if even one person looked at one row in the table, I count all pages in the table as having been accessed.

Now, you may make different decisions than I did above, and that’s fine… each environment and project has different needs. But these assumptions are very important to understanding the output of the query below:

USE master;
GO
 
/* 
    This will give you an approximation of how much data is being utilized on a server.
    Since the data is only valid as of the last server reboot, we should start off with
    an idea of how much data we've accrued.  
*/
 
/* Find out when the server was last rebooted */
-- 2008
SELECT sqlserver_start_time FROM sys.dm_os_sys_info;
-- 2005
SELECT create_date FROM sys.databases WHERE name = 'tempdb';
 
 
/* Create a temporary table to hold our data, since we're going to iterate through databases */
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
 
CREATE TABLE #Results
(
      databaseName  NVARCHAR(128)
    , tableName     NVARCHAR(128)
    , indexID       INT
    , records       BIGINT
    , activity      BIGINT
    , totalPages    BIGINT
);
 
/*  
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/
    Alternatively, you can also use sys.sp_MSforeachdb
*/
--EXECUTE master.dbo.sp_foreachdb
EXECUTE sys.sp_MSforeachdb
'   USE ?; 
 
    -- You can gleam a lot of information about historical data usage from partitions
    -- but for now, we will just roll up any partitions we may have
    WITH myCTE AS
    (
        SELECT p.[object_id] AS objectID
            , p.index_id
            , SUM(p.[rows]) AS records
            , SUM(au.total_pages) AS totalPages
        FROM sys.partitions AS p WITH (NOLOCK)
        JOIN sys.allocation_units AS au WITH (NOLOCK)
            ON p.hobt_id = au.container_id
        GROUP BY p.[object_id] 
            , p.index_id
    )
 
    -- Grab all tables and join to our usage stats DMV
    INSERT INTO #Results
    SELECT DB_NAME() AS databaseName
        , t.name
        , x.index_id
        , MAX(x.records) AS records
        , ISNULL(SUM(us.user_lookups + us.user_scans + us.user_seeks), 0) AS activity
        , SUM(x.totalPages) AS totalPages
    FROM sys.tables AS t WITH (NOLOCK)
    JOIN myCTE AS x
        ON t.[object_id] = x.objectID
    LEFT JOIN sys.dm_db_index_usage_stats AS us WITH (NOLOCK)
        ON us.[object_id] = x.objectID
        AND us.index_id = x.index_id
        AND us.database_id = DB_ID()
    GROUP BY t.name
    , x.index_id;'
 
/* Because we're looping through databases, make sure we're not missing any major ones */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
 
/* Retrieve actual % data utilization, which is performed at the index level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%DataUtil'
FROM (
    SELECT databaseName
        , tableName
        , indexID
        , CASE -- If we have any activity at all on an index, count it as activity
            WHEN activity = 0 THEN 0.0
            ELSE totalPages
          END AS queriedPages
        , totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
) x
GROUP BY databaseName
ORDER BY databaseName;
 
/* Retrieve % content utilization, which is performed at the table level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%ContentUtil'
FROM (
    SELECT databaseName
        , tableName
        , MAX(records) AS records
        , CASE WHEN SUM(activity) > 0 THEN SUM(totalPages) ELSE 0 END AS queriedPages
        , SUM(totalPages) AS totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
    GROUP BY databaseName
        , tableName
) x
GROUP BY databaseName
ORDER BY databaseName;

Results:

databaseName               TotalQueriedPages   TotalPages           %DataUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127618701           130607247            0.969619893356378
Database2 		   567188              1614958              0.351209133612143
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970594           170733391            0.803399928206158
Database5 		   74632930            101543575            0.66909214627557
Database6 		   55809933            72884205             0.765734157938039
Database7 		   560810026           620609815            0.902175272517656
 
databaseName               TotalQueriedPages   TotalPages           %ContentUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127763715           130607247            0.970721679051682
Database2 		   571125              1614958              0.353646967908763
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970921           170733391            0.803399928206158
Database5 		   96144726            101543575            0.861947682777784
Database6 		   72269666            72884205             0.991568146820268
Database7 		   620525938           620609815            0.998240279711804

The first result set examines the utilization of indexes, and the second result set examines the utilization of data at the content (table) level. For example, if we look at Database6, we’ll see that we are only utilizing 77% of our indexes, but we’re looking at 99% of our table data. So this is a good indicator that we have some unused indexes to clean up in that database.

Know a better way to answer this question using SQL Server DMV’s? Please leave me a comment so I can learn from your experience. :)

In unrelated news, this may be my last blog post for a little while. I’m due with my second child a week from today and expect all of my free time to be consumed by him for a little while. That and, quite frankly, I do not trust myself near a computer, especially a database, in such a sleep-deprived state. :)

11-Word Warning

Tom LaRock posted a new Meme Monday challenge: “Write a SQL blog post in 11 words or less.”

Donabel Santos tagged me, and I couldn’t resist the challenge. So here’s my entry:

Hasty coding, error prone. No backups, data loss. Company for sale.

This was inspired by the recent spate of stories I’ve heard about companies that have failed because they did not properly manage their data and databases.

I don’t know who’s been tagged or not, so I’m gagging some of my SQL Saturday Chicago friends:

Identity Columns: Are You Nearing The Limits?

I use identity columns frequently. After all, identity columns make great clustering keys. But it’s important when using identity columns to check on the amount of values you have left before you reach the limit of your data type. An identity column has a fixed amount of values it can use based upon whether you specified tinyint, smallint, int, or bigint when you defined the column. If you reach this limit, your inserts will blow up and cause a Chernobyl-like SQL meltdown will begin to fail. I just finished an audit of my tables and thought I’d share the script. I would like to warn that this script is *not* perfect… namely, it doesn’t handle negative integer values very elegantly. It also doesn’t know if you started your seed at zero, approached your max positive limit, then reseeded to the negative limit (see my “quick and dirty fix” tip at the end of this article).

/* Define how close we are to the value limit
   before we start throwing up the red flag.
   The higher the value, the closer to the limit. */
Declare @threshold decimal(3,2) = .85;
 
/* Create a temp table */
Create Table #identityStatus
(
      database_name     varchar(128)
    , table_name        varchar(128)
    , column_name       varchar(128)
    , data_type         varchar(128)
    , last_value        bigint
    , max_value         bigint
);
 
/* Use an undocumented command to run a SQL statement
   in each database on a server */
Execute sp_msforeachdb '
    Use [?];
    Insert Into #identityStatus
    Select ''?'' As [database_name]
        , Object_Name(id.object_id, DB_ID(''?'')) As [table_name]
        , id.name As [column_name]
        , t.name As [data_type]
        , Cast(id.last_value As bigint) As [last_value]
        , Case 
            When t.name = ''tinyint''   Then 255 
            When t.name = ''smallint''  Then 32767 
            When t.name = ''int''       Then 2147483647 
            When t.name = ''bigint''    Then 9223372036854775807
          End As [max_value]
    From sys.identity_columns As id
    Join sys.types As t
        On id.system_type_id = t.system_type_id
    Where id.last_value Is Not Null';
 
/* Retrieve our results and format it all prettily */
Select database_name
    , table_name
    , column_name
    , data_type
    , last_value
    , Case 
        When last_value < 0 Then 100
        Else (1 - Cast(last_value As float(4)) / max_value) * 100 
      End As [percentLeft]
    , Case 
        When Cast(last_value As float(4)) / max_value >= @threshold
            Then 'warning: approaching max limit'
        Else 'okay'
        End As [id_status]
From #identityStatus
Order By percentLeft;
 
/* Clean up after ourselves */
Drop Table #identityStatus;

If you find yourself quickly approaching your max limit and need to implement a quick and dirty fix, you can reseed your identity column. Of course, this only works if you started at zero instead of the actual lower, negative limit.

Effective Clustered Indexing

My first Simple-Talk article was published yesterday! I’m pretty excited about it and wanted to share the link. In the article, I give an overview of how clustered and nonclustered indexes work, and I demonstrate why clustered index best practices — narrow, unique, static, and ever-increasing — are important design considerations.

You can find the article on Simple-Talk’s website at:
http://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/

Please let me know your thoughts! :)

Yet Another PASS Summit Recap & Thoughts on PDW

The SQL blogosphere has been lit up with PASS Summit recaps.

I debated about whether or not to write my own post, until I remembered that this blog serves as a mini-journal for me too. I have a notoriously poor memory–my husband likes to say that my CPU and memory are good, but I must have an unusual clustering strategy–so maybe this blog post will be a good pointer for me when I start prepping for next year’s Summit. ;)

This was definitely the best PASS Summit conference ever. While there will always be opportunities to do things better–improvement is a never-ending process–it was clear that the organizers of this event listened to the feedback they had received the previous year. One of the best changes? Backpacks. These were very useful, as evidenced by their presence everywhere. Nice job, organizers!

My absolute favorite thing about Summit is the chance to meet and reconnect with so many amazing SQL folks. There were entirely too many people to list out, but some highlights include meeting Crys Manson, Jorge Segarra, and Karen Lopez for the first time. I also had a chance encounter with Ola Hallengren in the Sheraton elevator. Apparently we were only staying a few rooms apart this year. We ended up having a couple of really great discussions about index fragmentation, the differences between our scripts, and things we’d like to see changed in future releases of SQL Server.

I had the opportunity to sit on the panel at the WIT luncheon. All of the women on the panel were amazing, and I was honored just to be sitting at the same table as them. I was especially pleased to meet Nora Denzel, a Senior Vice President at Intuit. Intelligent, confident, and witty, she is a great role model for young technical women, myself included. I can only hope that some of her gumption rubbed off on me due to our close proximity. :) After the event, I was pleasantly surprised by how many folks–men and women both–came up to me to tell me how much they enjoyed it. Thanks to the WIT VC for organizing another great event!

The lightning talk sessions were a new feature this year, and I think I like it. The format of the lightning session is 7 speakers presenting on a topic for 5 quick minutes. Watching these sessions is kind of like skipping right to the center of a tootsie pop: all content and no fluff. The standout lightning talk presentation for me was Adam Machanic’s. It was beautifully rehearsed and choreographed. Nice job, Adam!

Another of the many highlights of the week was meeting the Microsoft execs. In addition to meeting Ted Kummert, Mark Souza, and Donald Farmer–all very nice gentlemen–I had the opportunity to speak at length with Jose Blakely about Parallel Data Warehouse (PDW). PDW, formerly codenamed Madison, was officially launched at Summit. Jose was kind enough to explain the PDW architecture, both where it came from and the vision for where it’s going. I’d attempt to regurgitate it here, but I think the probability of me misquoting would be high.

Suffice it to say, this technology has me excited. Why? Quite frankly, I think PDW will do for data warehousing what SQL Server did for databases, and what Analysis Services did for BI: make it affordable. With a compelling cost-per-terabyte, an attractive scale-out approach, and an entry point at under $1 million, we’ll see more small-to-midsized companies implementing data warehousing and business intelligence. This is good news for those of us looking for an affordable data warehouse solution and for those of us who make our living with SQL Server. And for those of you who might suggest that few companies need a datawarehouse that can support multi-terabyte data, I’d like to point out that just 3 or 4 years ago, 100 GB was considered a lot of data.

I spent most of my week digging into the PDW architecture. It’s not all roses–it’s a first release and, as such, is immature compared to the much older and more established data warehouse systems–but again, it has a lot going for it, not least of all it’s easy integration within a SQL Server environment and the relatively low cost. We’re currently investigating this as a possible data warehouse solution for our business intelligence environment, so expect to see more from me about PDW as I learn more about it.

Not attending PASS Summit? Watch LIVE streaming events FOR FREE!

If you’ve not yet heard, the annual PASS Summit is less than 2 weeks away. This is the largest SQL Server and Business Intelligence conference _in the world_, sponsored by Microsoft and Dell. The return on investment of attending this conference is pretty huge, and I highly recommend you attend if you can swing it.

I am once more fortunate to be attending and presenting at the Summit. Here’s where you can find me speaking throughout the week:

Tuesday at 3PM
Lightning Talk – Page Compression
This year, PASS has decided to try something new. A daily Lightning Talk session will be held where speakers present for 5 quick minutes on interesting SQL topics. I’ll be presenting on Tuesday with 6 amazingly talented speakers. My topic is page compression — what is it, how to do it, and (most importantly, of course) how it affects performance.

Wednesday at 11:30am in the ballroom
Women-In-Technology (WIT) Luncheon
I’ll be speaking on this year’s WIT luncheon panel, which is sponsored by GoDaddy.com. Contrary to common misconception, the luncheon is NOT just for women. In fact, men are encouraged to attend! If memory serves, last year’s luncheon had about 300 attendees, with a good mix of both genders. This year’s topic is focused on the recruitment, retention, and advancement of WIT. If you’re worried that this event will end up being a feminist bitch-fest, rest assured that’s most definitely not the case. I’ve always found the WIT events I’ve attended to be informative and thought-provoking. Plus, free lunch! :)

Thursday at 2:30PM (room 3AB)
Heaps of Trouble, Clusters of Glory – A Look At Index Internals
You can click the link above to read my abstract, but in short, I’ll be taking attendees on a journey through indexes. You’ll come away with a much better understanding of the internal structures of indexes, which should help DBA’s with database design and performance tuning.

If you’re not able to attend in person, Summit does sell DVD’s of the event afterwards, which are well worth the investment. But this year, to make the event more accessible to the community, PASS and Dell have teamed up to present live streaming of the keynotes and WIT luncheon sessions.

Here’s details of the keynotes from the PASS press release:

Ted Kummert, Senior Vice President of the Business Platform Division at Microsoft Corp., will kick off PASS Summit on November 9 by highlighting the continued innovation across Microsoft’s business and information platform. Kummert will explore Microsoft’s key technical investments, as well as Mission Critical applications and the accessibility of Business Intelligence.

Quentin Clark, General Manager of Database Systems Group at Microsoft Corp., will showcase the next version of SQL Server on November 10 and will share how features in this upcoming product milestone continue to deliver on Microsoft’s Information Platform vision. Clark will also demonstrate how developers can leverage new industry-leading tools with powerful features for data developers and a unified database development experience.

David DeWitt, Technical Fellow, Data and Storage Platform Division at Microsoft Corp., will be discussing SQL query optimization and address why it is difficult to always execute good plans in his highly anticipated technical keynote. DeWitt will also cover new technologies that offer the promise of better plans in future releases of SQL Server.

While all of the keynotes are interesting and definitely worth watching, I cannot recommend the David DeWitt keynote more highly. His keynote last Summit was outstanding. It was technical, thought provoking, and one of the best things of last year’s Summit.

You can find more information and register for the PASS Summit 2010 live streaming events at their website, www.sqlpass.org/LiveKeynotes

If you ARE attending Summit, make sure to swing by and say “hi” or message me via Twitter to see if there’s a time we can meet up. Meeting people is one of my favorite things about Summit. :)

Metadata for Table Valued Parameters

Table-valued parameters (TVP) are a great feature that was new in SQL Server 2008 that allow you to insert a dataset into a table. Previously, the most common way of doing this was by passing and parsing XML. As I’ve previously posted, TVP’s perform an astounding 94% faster than singleton inserts and 75% faster than XML inserts. But for some reason, TVP’s still aren’t widely used and understood. In this post, I’ll walk you through how to use these and how to query the metadata for TVP’s.

I’ve previously posted about what TVP’s are and how to use them. But in honor of Halloween this week, I’ve updated my demo script:

/* Create some tables to work with */
CREATE TABLE dbo.orders
(
      order_id      INT IDENTITY(1,1)   Not Null
    , orderDate     DATE                Not Null
    , customer      VARCHAR(20)         Not Null
 
    CONSTRAINT PK_orders
        PRIMARY KEY CLUSTERED(order_id)
);
 
CREATE TABLE dbo.orderDetails
(
      orderDetail_id    INT IDENTITY(1,1)   Not Null
    , order_id          INT                 Not Null
    , lineItem          INT                 Not Null
    , product           VARCHAR(20)         Not Null
 
    CONSTRAINT PK_orderDetails
        PRIMARY KEY CLUSTERED(orderDetail_id)
 
    CONSTRAINT FK_orderDetails_orderID
        FOREIGN KEY(order_id)
        REFERENCES dbo.orders(order_id)
);
 
 
/* Create our new table types */
CREATE TYPE dbo.orderTable AS TABLE 
( 
      orderDate     DATE
    , customer      VARCHAR(20)
);
GO
 
CREATE TYPE dbo.orderDetailTable AS TABLE 
( 
      lineItem      INT
    , product       VARCHAR(20)
);
GO
 
 
/* Create a new procedure using a table-valued parameter */
CREATE PROCEDURE dbo.insert_orderTVP_sp
      @myOrderTable orderTable READONLY
    , @myOrderDetailTable orderDetailTable READONLY
AS
BEGIN
 
    SET NOCOUNT ON;
 
    DECLARE @myOrderID INT;
 
    INSERT INTO dbo.orders
    SELECT orderDate
        , customer
    FROM @myOrderTable;
 
    SET @myOrderID = SCOPE_IDENTITY();
 
    INSERT INTO dbo.orderDetails
    SELECT @myOrderID
        , lineItem
        , product
    FROM @myOrderDetailTable;
 
    SET NOCOUNT OFF;
 
END
GO
 
 
/* Call our new proc! */
DECLARE @myTableHeaderData AS orderTable
    , @myTableDetailData AS orderDetailTable;
 
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Zombie');
 
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 10, 'Brains' UNION ALL
SELECT 20, 'More Brains';
 
EXECUTE dbo.insert_orderTVP_sp 
      @myTableHeaderData
    , @myTableDetailData;
 
DELETE FROM @myTableHeaderData;
DELETE FROM @myTableDetailData;
 
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Vampire');
 
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 10, 'Blood Type O+' UNION ALL
SELECT 20, 'Blood Type B-' UNION ALL
SELECT 30, 'Blood Type AB+' UNION ALL
SELECT 40, 'Blood Type A+';
 
EXECUTE dbo.insert_orderTVP_sp 
      @myTableHeaderData
    , @myTableDetailData;
 
 
/* Check our data */
SELECT * FROM dbo.orders;
SELECT * FROM dbo.orderDetails;

Once you’ve run this, you should see the following data:

order_id    orderDate  customer
----------- ---------- --------------------
1           2010-10-28 Zombie
2           2010-10-28 Vampire
 
(2 row(s) affected)
 
orderDetail_id order_id    lineItem    product
-------------- ----------- ----------- --------------------
1              1           10          Brains
2              1           20          More Brains
3              2           10          Blood Type O+
4              2           20          Blood Type B-
5              2           30          Blood Type AB+
6              2           40          Blood Type A+
 
(6 row(s) affected)

Now that we’ve successfully created a couple of table types to support our TVP’s, how do we go back and find out which objects we’ve created? You can query the sys.types catalog view to find out. Just search for system_type_id 243, which identifies the record as a table type.

/* Let's check out our new data types */
SELECT name
    , system_type_id
    , is_table_type
FROM sys.types
WHERE system_type_id = 243;
GO
name                 system_type_id is_table_type
-------------------- -------------- -------------
orderTable           243            1
orderDetailTable     243            1
 
(2 row(s) affected)

Even better yet, you can use the sys.table_types catalog view. This gives us the same information as sys.types but also gives us the type_table_object_id, which we’ll need shortly.

SELECT name
    , system_type_id
    , is_table_type
    , type_table_object_id
FROM sys.table_types;
name                 system_type_id is_table_type type_table_object_id
-------------------- -------------- ------------- --------------------
orderTable           243            1             917578307
orderDetailTable     243            1             933578364
 
(2 row(s) affected)

What if you need to look up the table type definition? You can do this using the type_table_object_id and joining to sys.columns.

SELECT tt.name AS 'table_type_name'
    , c.name AS 'column_name'
    , t.name AS 'data_type'
FROM sys.table_types AS tt
JOIN sys.columns AS c
    ON tt.type_table_object_id = c.object_id
JOIN sys.types As t
    ON c.system_type_id = t.system_type_id;
table_type_name      column_name     data_type
-------------------- --------------- ---------------
orderTable           orderDate       date
orderDetailTable     lineItem        int
orderTable           customer        varchar
orderDetailTable     product         varchar
 
(4 row(s) affected)

And last, but certainly not least, how do we see if any procs are currently using the table types? SQL Server 2008 makes this easy for us with the sys.dm_sql_referencing_entities DMV.

SELECT referencing_schema_name, referencing_entity_name, referencing_id
FROM sys.dm_sql_referencing_entities ('dbo.orderTable', 'TYPE');
referencing_schema_name referencing_entity_name referencing_id
----------------------- ----------------------- --------------
dbo                     insert_orderTVP_sp      949578421
 
(1 row(s) affected)

If you’re wondering how to implement SQL Server TVP’s in your .NET code, well… I can’t tell you how to do it, but I can point you to a place that can. Stephen Forte has a post that explains how easy it is to do.

So now that you have a better understanding of how to work with TVP’s, why don’t you go implement one in your environment and see how for yourself just how awesome it is? :)

Oh, and Happy Halloween!

See you in Seattle!

My Summit abstract was accepted! I’m still a little surprised, but I’m also excited (okay, and a little nervous) to once more be presenting at the PASS Summit. If you’ll be at Summit this year — and I really hope you are, as it’s well worth the time and cost — then please make sure to say “hi” if you see me wandering around. Aside from the *excellent* content, my favorite thing about Summit is getting to meet so many great people. :)

In other news, I’ve once more switched roles within GoDaddy. For the half dozen folks who’ve been following my blog from the beginning, you may remember that I originally started out on the traffic team working with tuning and VLDB’s, then took an opportunity to switch to the BI team to learn more about OLAP. Recently, a new team has been formed under the BI branch that’s tasked with developing a massive hybrid data warehouse (by hybrid, I mean half OLTP and half OLAP). “How massive is it?” Well, it’s SO massive, we’re expecting to be store petabytes of data when everything is said and done. I’m happy to say I’ll be on this new team. So yes, that means we have an opening for an OLAP developer. We’re also hiring SQL Server DBA’s. We have offices in Cedar Rapids, Denver, and the Phoenix area. Send me an e-mail at michelle at sqlfool dot com if you’re interested in learning more about this great job opportunity and company.

Lastly, I want to announce that SQL Saturday 50 is now open for registration! SQL Saturday 50 will be held in Iowa City, IA on Saturday, September 18th. We’re almost at 50% of our attendance capacity, so if you’re interested in attending, please register soon.

That’s it for now. I promise that my next blog post will be uber technical. :)

SQL Saturday #50 – Call for Speakers

The Call for Speakers is now open for SQL Saturday #50, the East Iowa SQL Saturday event! This is our second time hosting a SQL Saturday, and we’re hoping to build upon the success of last year’s event. We’re looking for a wide variety of topics on SQL Server and related technologies (i.e. PowerShell, R2, LINQ, etc.). We also have had several requests for intro-level topics, such as beginning disaster recovery and basic performance tuning. If you’re even remotely thinking about speaking, please submit an abstract!

Last year we had about 100 folks attend from surrounding areas. This year, we’re shooting for 125 attendees, which would max out our facility’s capacity. Not sure how far away Iowa City is? It may be closer than you think. Allow me to rehash my travel times from last year’s plea for speakers:

  • Chicago – 3.5 hours
  • Omaha – 3.5 hours
  • Milwaukee – 4 hours
  • Kansas City – 4.5 hours
  • Minneapolis – 5 hours
  • St. Louis – 5 hours
  • Indianapolis – 6 hours

The event will be held on September 18th at the University of Iowa in Iowa City. You can find more information, including an abstract submission form, on our event website at http://sqlsaturday.com/50/eventhome.aspx.

Oh, and if you do make it to our SQL Saturday event, please make sure to stop me and say “hi!” :)

DELETE 5_Useless_Things FROM [SQL Server]

It’s been a while since I’ve been caught up in a round of chainblogging, the blogosphere’s version of a Facebook meme. This time, Denis Gobo tagged me in a post started by Paul Randal. Paul asked us to list the “top-5 things in SQL Server we all wish would just be removed from the product once and for all.” I reviewed other posts, and the good and bad news is that they already listed several of the same things I would have. The good news is I’m apparently not alone; the bad news is that means I need to come up with something original! So while these wouldn’t necessarily be the *first* 5 on my list, they’d still be on the list nevertheless:

Default Autogrowth Options
Okay, so I lied. I’m not completely original. Yes, I know Paul Randal also commented on this one. While I said I would try to come up with only original ones, this one just has to be repeated. I’ve actually this option overlooked in production environments, resulting in thousands of VLF’s. It’s just a terrible default, and it needs to be changed.

Edit Top 200 Rows
This “feature” is just asking for trouble. Any DBA who is managing a SQL Server database should understand how to actually write insert/update/delete statements. Maybe leave the option available in SQL Express, but please remove it from SQL Server Standard & Enterprise.

Debug
There’s nothing wrong with the Debug option, but I think it should be removed as a default option for the toolbar. It’s easily mistaken for “Execute,” which I’ve seen more than one DBA do on occasion.

PIVOT
I understand the need to pivot your data, but let’s face it. PIVOT is a clunky, expensive SQL operation. Let’s move the presentation tasks to the presentation layer (.NET), and reserve the database layer for what it does best.

Update: By popular demand, I have removed PIVOT from this list. Who am I to argue with such fine folks? :)

Cursors
Okay, okay, I know I can’t actually get rid of this, BUT I think it gets abused way too much. Set-based operations, anyone?

Alrighty, now it’s my turn to tag! I’m not sure if they’ve already been hit, but I’m tagging: