SQL OVER()子句-何时以及为何有用?


169
    USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);

我读到该条款,但我不明白为什么需要它。该功能Over做什么?怎么Partitioning By办?为什么我不能用文字查询Group By SalesOrderID


30
无论您使用哪种RDBMS,Postgres教程都可能会有所帮助。有例子;帮助过我。
Andrew Lazarus

Answers:


144

可以使用GROUP BY SalesOrderID。区别在于,使用GROUP BY,您只能拥有未包含在GROUP BY中的列的汇总值。

相反,使用窗口聚合函数而不是GROUP BY,则可以检索聚合值和非聚合值。也就是说,尽管您没有在示例查询中执行此操作,但是可以OrderQty在相同SalesOrderIDs的组中同时检索单个值及其总和,计数,平均值等。

这是一个为什么窗口聚合效果很好的实际例子。假设您需要计算每个值占总数的百分比。如果没有窗口聚合,则必须首先派生一个聚合值列表,然后将其重新连接到原始行集,例如:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]

现在看一下如何使用窗口聚合执行相同的操作:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig

更容易,更清洁,不是吗?


68

OVER条款是强大的,你可以有在不同的范围(“开窗”)聚集,无论你使用GROUP BY与否

示例:获取每个SalesOrderID计数和全部计数

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) AS 'Count'
    ,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
GROUP BY
     SalesOrderID, ProductID, OrderQty

获得不同的COUNTs,否GROUP BY

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
    ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
    ,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)

47

如果您只想按SalesOrderID分组,那么您将无法在SELECT子句中包括ProductID和OrderQty列。

PARTITION BY子句让您分解聚合函数。一个显而易见且有用的示例是,如果您想为订单上的订单行生成行号:

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id

(我的语法可能会稍微关闭)

然后,您将获得类似以下内容的信息:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1

42

让我用一个例子来解释,您将能够看到它是如何工作的。

假设您具有下表DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow

在SQL下运行

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT

结果如下

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2

看看发生了什么。

您可以在没有YEAR分组依据和ROW匹配的情况下进行计数。

如果使用如下的WITH子句,获得相同结果的另一种有趣方式,WITH可以作为内联VIEW使用,并且可以简化查询,尤其是复杂的查询,但是这里不是这种情况,因为我只是在尝试显示用法

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;

17

当OVER子句与PARTITION BY结合使用时,必须通过评估查询返回的行来分析性地完成前面的函数调用。将其视为内联GROUP BY语句。

OVER (PARTITION BY SalesOrderID) 表示对于SUM,AVG等函数,返回值超过查询返回的记录的子集,并使用外键SalesOrderID返回该子集的PARTITION。

因此,我们将求和每个EACH UNIQUE SalesOrderID的每个OrderQty记录,该列名称将称为“总计”。

这比使用多个内联视图查找相同的信息要有效得多。您可以将此查询放入嵌入式视图中,然后在Total上进行过滤。

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200

2
  • 也称为Query Petition条款。
  • 类似于Group By条款

    • 将数据分解成块(或分区)
    • 按分区界限分开
    • 功能在分区内执行
    • 越过分界线时重新初始化

语法:
函数(...)OVER(PARTITION BY col1 col3,...)

  • 功能

    • 熟悉的功能,例如COUNT()SUM()MIN()MAX(),等
    • 新功能以及(如ROW_NUMBER()RATION_TO_REOIRT()等)


示例的更多信息:http : //msdn.microsoft.com/en-us/library/ms189461.aspx


-3
prkey   whatsthat               cash   
890    "abb                "   32  32
43     "abbz               "   2   34
4      "bttu               "   1   35
45     "gasstuff           "   2   37
545    "gasz               "   5   42
80009  "hoo                "   9   51
2321   "ibm                "   1   52
998    "krk                "   2   54
42     "kx-5010            "   2   56
32     "lto                "   4   60
543    "mp                 "   5   65
465    "multipower         "   2   67
455    "O.N.               "   1   68
7887   "prem               "   7   75
434    "puma               "   3   78
23     "retractble         "   3   81
242    "Trujillo's stuff   "   4   85

那是查询的结果。用作源的表与没有最后一列的表相同。此列是第三列的移动总和。

查询:

SELECT prkey,whatsthat,cash,SUM(cash) over (order by whatsthat)
    FROM public.iuk order by whatsthat,prkey
    ;

(表作为public.iuk)

sql version:  2012

它略高于dbase(1986)的水平,我不知道为什么需要25年以上的时间才能完成它。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.