可以使用DISTINCT进行分区函数COUNT()的覆盖


88

我正在尝试编写以下内容,以便获得不同的NumUser的总运行量,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management Studio对此不太满意。当我删除DISTINCT关键字时,错误消失了,但是不会有明显的区别。

DISTINCT在分区功能中似乎不可行。我该如何找到不同的计数?我是否使用更传统的方法,例如相关子查询?

进一步研究一下,也许这些OVER功能与Oracle的工作方式不同,无法使用它们SQL-Server来计算运行总计。

我在SQLfiddle上添加了一个实时示例,在该示例中,我尝试使用分区函数来计算运行总计。


2
COUNTORDER BY代替PARTITION BY2008年是不明确的,我惊讶它让你拥有它。根据文档,您不允许ORDER BY使用聚合函数。
Damien_The_Unbeliever 2012年

是的-认为我对某些oracle功能感到困惑;这些运行总计和运行计数将涉及更多一些
为什么

Answers:


177

有一个非常简单的解决方案 dense_rank()

dense_rank() over (partition by [Mth] order by [UserAccountKey]) 
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc) 
- 1

这将为您提供确切的要求:每个月内不同的UserAccountKeys的数量。


23
要注意的一件事dense_rank()是它将计数NULL,COUNT(field) OVER而不会计数。因此,我无法在解决方案中使用它,但我仍然认为它非常聪明。
bf2020

1
但是我一直在寻找每年几个月中不同用户帐户的总数:不确定如何回答吗?
whytheq,

4
@ bf2020,如果中可以包含NULLUserAccountKey,则需要添加以下术语:-MAX(CASE WHEN UserAccountKey IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY Mth)。想法来自下面的LarsRönnbäck的答案。本质上,如果UserAccountKey具有NULL值,则需要1从结果中减去多余的值,因为它DENSE_RANK会计数NULL。
弗拉基米尔·巴拉诺夫(Fladimir Baranov)

1
@ahsteele谢谢你,你让我震惊并解决了我的问题
Henrique Donati

在此讨论dense_rank当窗口函数具有框架时如何使用此解决方案。SQL Server不允许dense_rank使用窗框:stackoverflow.com/questions/63527035/...
K4M

6

死灵法师:

通过DENSE_RANK用MAX模拟PARTITION BY的COUNT DISTINCT非常简单:

;WITH baseTable AS
(
    SELECT 'RM1' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM1' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR3' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR2' AS ADR
)
,CTE AS
(
    SELECT RM, ADR, DENSE_RANK() OVER(PARTITION BY RM ORDER BY ADR) AS dr 
    FROM baseTable
)
SELECT
     RM
    ,ADR

    ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY ADR) AS cnt1 
    ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM) AS cnt2 
    -- Not supported
    --,COUNT(DISTINCT CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY CTE.ADR) AS cntDist
    ,MAX(CTE.dr) OVER (PARTITION BY CTE.RM ORDER BY CTE.RM) AS cntDistEmu 
FROM CTE

注意:
这假设所讨论的字段是不可为空的字段。
如果字段中有一个或多个NULL条目,则需要减去1。


5

我使用的解决方案与上面的David相似,但是如果某些行应从计数中排除,则需要额外的修改。假设[UserAccountKey]永远不会为null。

-- subtract an extra 1 if null was ranked within the partition,
-- which only happens if there were rows where [Include] <> 'Y'
dense_rank() over (
  partition by [Mth] 
  order by case when [Include] = 'Y' then [UserAccountKey] else null end asc
) 
+ dense_rank() over (
  partition by [Mth] 
  order by case when [Include] = 'Y' then [UserAccountKey] else null end desc
)
- max(case when [Include] = 'Y' then 0 else 1 end) over (partition by [Mth])
- 1

可以在此处找到带有扩展示例的SQL Fiddle。


1
你的想法可以用来做原配方(不复杂[Include],你在你的答案谈论)有dense_rank()工作的时候UserAccountKey可以NULL。将此术语添加到公式中: -MAX(CASE WHEN UserAccountKey IS NULL THEN 1 ELSE 0 END) OVER (PARTITION BY Mth)
弗拉基米尔·巴拉诺夫(Fladimir Baranov)

5

我认为在SQL Server 2008R2中执行此操作的唯一方法是使用相关子查询或外部应用程序:

SELECT  datekey,
        COALESCE(RunningTotal, 0) AS RunningTotal,
        COALESCE(RunningCount, 0) AS RunningCount,
        COALESCE(RunningDistinctCount, 0) AS RunningDistinctCount
FROM    document
        OUTER APPLY
        (   SELECT  SUM(Amount) AS RunningTotal,
                    COUNT(1) AS RunningCount,
                    COUNT(DISTINCT d2.dateKey) AS RunningDistinctCount
            FROM    Document d2
            WHERE   d2.DateKey <= document.DateKey
        ) rt;

可以在SQL-Server 2012中使用建议的语法来完成此操作:

SELECT  datekey,
        SUM(Amount) OVER(ORDER BY DateKey) AS RunningTotal
FROM    document

但是,DISTINCT仍然不允许使用,因此,如果需要DISTINCT和/或如果不选择升级,那么我认为这OUTER APPLY是您的最佳选择


很酷,谢谢。我找到了这样的答案,它具有我将尝试的OUTER APPLY选项。您是否已在该答案中看到循环的UPDATE方法...它已经很遥远了而且显然很快速。2012年的生活会更轻松-是Oracle的直接副本吗?
whytheq 2012年
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.