Easy Way To Return Top Records

Okay, so that title may suck. I accept that. It’s late and I can’t think of anything better at the moment. :)

Bad blog title aside, let’s take a pretty common data request. You need to return the top sales performer in each department. If you’ve ever had this type of request, then you know there’s a few different ways of handling this, and it can be a little complicated. Today, Dave Carlile shared with me a new and pretty simple way of handling this with Row_Number().

The syntax for Row_Number is a little different than what you may be used to: ROW_NUMBER ( ) OVER ( [ PARTITION BY yourColumn ] ORDER BY yourColumn )

PARTITION BY is what you want to group by. This is optional.

ORDER BY is how you want to order your data before assigning a row number. This is required.

Let’s take a look at an example.

/* Create a table to play with */
Create Table dbo.sales
(
      order_id      int Identity(1,1)
    , salesPerson   varchar(20) 
    , department    varchar(20)   
    , total         money
 
    Constraint PK_sales
        Primary Key Clustered(order_id)
);
 
/* Load it up with some bogus records */
Insert Into dbo.sales
Select 'Amanda', 'Sales', 420 Union All
Select 'Barry', 'Sales', 360 Union All
Select 'Chris', 'Marketing', 398 Union All
Select 'David', 'Sales', 371 Union All
Select 'Ethan', 'Customer Support', 123 Union All
Select 'Faith', 'Sales', 206 Union All
Select 'Gavin', 'Marketing', 396 Union All
Select 'Heather', 'Marketing', 51 Union All
Select 'Iris', 'Customer Support', 79 Union All
Select 'Jamie', 'Customer Support', 242;
 
/* Examine what values are returned for each record */
Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank'
    , salesPerson
    , department
    , total
From dbo.sales;
 
/* Let's grab just the top sales performer in each department */
With myCTE As
(
    Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank'
        , salesPerson
        , department
        , total
    From dbo.sales
)
 
Select salesPerson
    , department
    , total
From myCTE
Where salesRank = 1
Order By total Desc;

Let’s take a look at the options we’ve specified for Row_Number(). Since we want to know who has the top sales, we’re going to order by [total] in descending order. We also want to assign each department its own rank, so we’re going to group (partition) by the [department] column. If we did not include the “Partition By” clause, then we’d get only 1 record returned, which would be the top overall sales person (in this case, Amanda).

Now let’s do the same thing, but this time we want to return the top 2 sales person in each department.

/* Now grab the top TWO sales performer in each department */
With myCTE As
(
    Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank'
        , salesPerson
        , department
        , total
    From dbo.sales
)
 
Select salesPerson
    , department
    , total
From myCTE
Where salesRank <= 2 -- this is the only difference
Order By department
    , salesRank;
 
/* Clean-Up! */
Drop Table dbo.sales;

That’s all there is to it! Pretty cool, huh? I haven’t gotten around to performance testing on large data sets yet, but I definitely like the simplicity of the approach.

Thanks, Dave! :)

Update: Aaron The Hobt has already done some performance testing on this very subject. The results? Not as good as I was hoping. :(

As an aside, I’m going to be participating in the Pain of the Week webcast tomorrow at 11 AM ET. This free webcast will be on index fragmentation: what is it, how to find it, and how to fix it. If you’re interested, you can register here: http://www.quest.com/events/ListDetails.aspx?ContentID=8857.

This will be only my second time speaking to an audience (the first time was yesterday at our first PASS Chapter meeting!). So if nothing else, it may be good for a few laughs. :)

Source: http://sqlfool.com/2009/03/easy-way-to-return-top-records/

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , . Bookmark the permalink.

4 Responses to Easy Way To Return Top Records

  1. Boris Kagan says:

    An alternative (ancient) way to get the same result, with probably a bit less code:

    select *
    from sales
    where sales.order_id = (
    select top 1 order_id
    from sales s2
    where s2.department = sales.department
    order by s2.total desc
    )

    If wanted more than one, change ” … = (select top 1 …” to ” … in (select top n…”

  2. TroyK says:

    Take a look at the Rank() and Dense_Rank() functions — better control over ties, etc. = more suited to your scenario.

    For fun, also check out the NTILE() function.

    http://msdn.microsoft.com/en-us/library/ms189798.aspx

    HTH,
    TroyK

  3. Boris, thanks for the comment! I think your code is returning the orders with the most sales though, which is a little different. :)

  4. Hi Troy! I have actually never had occasion to use most of those functions, including NTILE. I’ll definitely take a look, thanks! :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>