SQL RANK()与ROW_NUMBER()


189

我对两者之间的差异感到困惑。运行以下SQL将获得两个重要结果集。有人可以解释差异吗?

SELECT ID, [Description], RANK()       OVER(PARTITION BY StyleID ORDER BY ID) as 'Rank'      FROM SubStyle
SELECT ID, [Description], ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) as 'RowNumber' FROM SubStyle

Answers:


221

ROW_NUMBER:为从1开始的每一行返回一个唯一的数字。对于具有重复值的行,数字是任意分配的。

等级:为每行从1开始分配一个唯一的数字(具有重复值的行除外),在这种情况下,将分配相同的等级,并且在每个重复等级的序列中会出现空白。


324

如果您在分区中具有特定订购值的联系,则只会看到差异。

RANK并且 DENSE_RANK在这种情况下是确定性的,对于排序列和分区列,所有具有相同值的行将最终得到相等的结果,而ROW_NUMBER将任意(非确定性)将递增结果分配给绑定的行。

示例:(所有行都具有相同的位置,StyleID因此它们在同一个分区中,并且在该分区中,前3行按排序ID

WITH T(StyleID, ID)
     AS (SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,2)
SELECT *,
       RANK() OVER(PARTITION BY StyleID ORDER BY ID)       AS 'RANK',
       ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS 'ROW_NUMBER',
       DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'DENSE_RANK'
FROM   T  

退货

StyleID     ID       RANK      ROW_NUMBER      DENSE_RANK
----------- -------- --------- --------------- ----------
1           1        1         1               1
1           1        1         2               1
1           1        1         3               1
1           2        4         4               2

您可以看到,对于相同的三行,其ROW_NUMBER增量RANK保持不变,然后跳至4DENSE_RANK还为所有三行分配了相同的等级,但随后为下一个不同的值分配了值2。


25
太棒了!...感谢提及DENSE_RANK
Sandeep Thomas

7
感谢您的好榜样。帮助我意识到当ROW_NUMBER()更合适时,我错误地使用了RANK()函数。
Ales Potocnik Hahonina 2014年

2
说真的,这太棒了。
马特·费扎尼

35

本文介绍了ROW_NUMBER()和之间的有趣关系DENSE_RANK()(该RANK()函数未作特别处理)。当您需要ROW_NUMBER()SELECT DISTINCT语句中生成时,ROW_NUMBER()通过DISTINCT关键字将其删除之前生成不同的值。例如此查询

SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

...可能会产生以下结果(DISTINCT无效):

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

而此查询:

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

...在这种情况下产生您可能想要的东西:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

注意,该函数的ORDER BY子句DENSE_RANK()将需要该子句中的所有其他列SELECT DISTINCT才能正常工作。

原因是从逻辑上讲,DISTINCT应用之前先计算窗口函数

比较所有三个功能

使用PostgreSQL / Sybase / SQL标准语法(WINDOW子句):

SELECT
  v,
  ROW_NUMBER() OVER (window) row_number,
  RANK()       OVER (window) rank,
  DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v

... 你会得到:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+

1
ROW_NUMBER和DENSE_RANK都在应用distinct之前产生值。实际上,所有排名函数或任何函数都会在应用DISTINCT之前产生结果。
Thanasis Ioannidis

1
@ThanasisIoannidis:绝对。我已经通过指向博客文章的链接更新了答案,在其中我解释了SQL操作
Lukas Eder,


1

没有分区子句的简单查询:

select 
    sal, 
    RANK() over(order by sal desc) as Rank,
    DENSE_RANK() over(order by sal desc) as DenseRank,
    ROW_NUMBER() over(order by sal desc) as RowNumber
from employee 

输出:

    --------|-------|-----------|----------
    sal     |Rank   |DenseRank  |RowNumber
    --------|-------|-----------|----------
    5000    |1      |1          |1
    3000    |2      |2          |2
    3000    |2      |2          |3
    2975    |4      |3          |4
    2850    |5      |4          |5
    --------|-------|-----------|----------

0

看这个例子。

CREATE TABLE [dbo].#TestTable(
    [id] [int] NOT NULL,
    [create_date] [date] NOT NULL,
    [info1] [varchar](50) NOT NULL,
    [info2] [varchar](50) NOT NULL,
)

插入一些数据

INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/2/09', 'Red', 'Yellow')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/3/09', 'Orange', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/1/09', 'Yellow', 'Blue')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/5/09', 'Blue', 'Orange')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/2/09', 'Green', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/8/09', 'Red', 'Blue')

对1重复相同的值

插入dbo。#TestTable(id,create_date,info1,info2)值(1,'1/1/09','Blue','Green')

看全部

SELECT * FROM #TestTable

看你的结果

SELECT Id,
    create_date,
    info1,
    info2,
    ROW_NUMBER() OVER (PARTITION BY Id ORDER BY create_date DESC) AS RowId,
    RANK() OVER(PARTITION BY Id ORDER BY create_date DESC)    AS [RANK]
FROM #TestTable

需要了解不同


-1

另外,使用RANK时,请注意PARTITION中的ORDER BY(例如,使用标准AdventureWorks数据库)。

选择as1.SalesOrderID,as1.SalesOrderDetailID,RANK()OVER(PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID)ranknoequal,RANK()OVER(PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderDetailId)ranknodiff从Sales.SalesOrderDetail SalesOrderId = 43659 ORDER BY SalesOrderDetailId;

给出结果:

SalesOrderID SalesOrderDetailID rank_same_as_partition rank_salesorderdetailid
43659 1 1 1
43659 2 1 2
43659 3 1 3
43659 4 1 4
43659 5 1 5
43659 6 1 6
43659 7 1 7
43659 8 1 8
43659 9 1 9
43659 10 1 10
43659 11 1 11
43659 12 1 12

但是如果将订单更改为(使用OrderQty:

选择as1.SalesOrderID,as1.OrderQty,RANK()OVER(PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID)ranknoequal,RANK()OVER(PARTITION BY as1.SalesOrderID ORDER BY as1.OrderQty)rank_orderqty from W.SalesOrderDetailE as SalesOrderId = 43659 ORDER BY OrderQty;

给出:

SalesOrderID OrderQty rank_salesorderid rank_orderqty
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 2 1 7
43659 2 1 7
43659 3 1 9
43659 3 1 9
43659 4 1 11
43659 6 1 12

请注意,当我们在ORDER BY中使用OrderQty(最右列的第二个表)时,排名如何变化;当我们在ORDER BY中使用SalesOrderDetailID(最右列的第二个表)时,排名如何变化。


-1

我没有对等级进行任何操作,但是今天我通过row_number()发现了这一点。

select item, name, sold, row_number() over(partition by item order by sold) as row from table_name

这将导致重复的行号,因为在我的情况下,每个名称都包含所有项。每个项目将按售出数量进行订购。

+--------+------+-----+----+
|glasses |store1|  30 | 1  |
|glasses |store2|  35 | 2  |
|glasses |store3|  40 | 3  |
|shoes   |store2|  10 | 1  |
|shoes   |store1|  20 | 2  |
|shoes   |store3|  22 | 3  |
+--------+------+-----+----+
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.