Okay, so that title may suck. I accept that. It’s late and I can’t think of anything better at the moment. 🙂
Bad blog title aside, let’s take a pretty common data request. You need to return the top sales performer in each department. If you’ve ever had this type of request, then you know there’s a few different ways of handling this, and it can be a little complicated. Today, Dave Carlile shared with me a new and pretty simple way of handling this with Row_Number().
The syntax for Row_Number is a little different than what you may be used to: ROW_NUMBER ( ) OVER ( [ PARTITION BY yourColumn ] ORDER BY yourColumn )
PARTITION BY is what you want to group by. This is optional.
ORDER BY is how you want to order your data before assigning a row number. This is required.
Let’s take a look at an example.
/* Create a table to play with */ Create Table dbo.sales ( order_id int Identity(1,1) , salesPerson varchar(20) , department varchar(20) , total money Constraint PK_sales Primary Key Clustered(order_id) ); /* Load it up with some bogus records */ Insert Into dbo.sales Select 'Amanda', 'Sales', 420 Union All Select 'Barry', 'Sales', 360 Union All Select 'Chris', 'Marketing', 398 Union All Select 'David', 'Sales', 371 Union All Select 'Ethan', 'Customer Support', 123 Union All Select 'Faith', 'Sales', 206 Union All Select 'Gavin', 'Marketing', 396 Union All Select 'Heather', 'Marketing', 51 Union All Select 'Iris', 'Customer Support', 79 Union All Select 'Jamie', 'Customer Support', 242; /* Examine what values are returned for each record */ Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank' , salesPerson , department , total From dbo.sales; /* Let's grab just the top sales performer in each department */ With myCTE As ( Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank' , salesPerson , department , total From dbo.sales ) Select salesPerson , department , total From myCTE Where salesRank = 1 Order By total Desc; |
Let’s take a look at the options we’ve specified for Row_Number(). Since we want to know who has the top sales, we’re going to order by [total] in descending order. We also want to assign each department its own rank, so we’re going to group (partition) by the [department] column. If we did not include the “Partition By” clause, then we’d get only 1 record returned, which would be the top overall sales person (in this case, Amanda).
Now let’s do the same thing, but this time we want to return the top 2 sales person in each department.
/* Now grab the top TWO sales performer in each department */ With myCTE As ( Select ROW_NUMBER() Over(Partition By department Order By total Desc) As 'salesRank' , salesPerson , department , total From dbo.sales ) Select salesPerson , department , total From myCTE Where salesRank <= 2 -- this is the only difference Order By department , salesRank; /* Clean-Up! */ Drop Table dbo.sales; |
That’s all there is to it! Pretty cool, huh? I haven’t gotten around to performance testing on large data sets yet, but I definitely like the simplicity of the approach.
Thanks, Dave! 🙂
Update: Aaron The Hobt has already done some performance testing on this very subject. The results? Not as good as I was hoping. 🙁
As an aside, I’m going to be participating in the Pain of the Week webcast tomorrow at 11 AM ET. This free webcast will be on index fragmentation: what is it, how to find it, and how to fix it. If you’re interested, you can register here: http://www.quest.com/events/ListDetails.aspx?ContentID=8857.
This will be only my second time speaking to an audience (the first time was yesterday at our first PASS Chapter meeting!). So if nothing else, it may be good for a few laughs. 🙂
Source: http://sqlfool.com/2009/03/easy-way-to-return-top-records/
An alternative (ancient) way to get the same result, with probably a bit less code:
select *
from sales
where sales.order_id = (
select top 1 order_id
from sales s2
where s2.department = sales.department
order by s2.total desc
)
If wanted more than one, change ” … = (select top 1 …” to ” … in (select top n…”
Take a look at the Rank() and Dense_Rank() functions — better control over ties, etc. = more suited to your scenario.
For fun, also check out the NTILE() function.
http://msdn.microsoft.com/en-us/library/ms189798.aspx
HTH,
TroyK
Boris, thanks for the comment! I think your code is returning the orders with the most sales though, which is a little different. 🙂
Hi Troy! I have actually never had occasion to use most of those functions, including NTILE. I’ll definitely take a look, thanks! 🙂