SQL在分区上计数不同


10

我有一个包含两列的表,我想计算Col_B上的不同值(以Col_A为条件)。

MyTable

Col_A | Col_B 
A     | 1
A     | 1
A     | 2
A     | 2
A     | 2
A     | 3
b     | 4
b     | 4
b     | 5

预期结果

Col_A   | Col_B | Result
A       | 1     | 3
A       | 1     | 3
A       | 2     | 3
A       | 2     | 3
A       | 2     | 3
A       | 3     | 3
b       | 4     | 2
b       | 4     | 2
b       | 5     | 2

我尝试了以下代码

select *, 
count (distinct col_B) over (partition by col_A) as 'Result'
from MyTable

计数(与col_B不同)。如何重写count函数以计算不同的值?

Answers:


18

这是我的方法:

SELECT      *
FROM        #MyTable AS mt
CROSS APPLY (   SELECT COUNT(DISTINCT mt2.Col_B) AS dc
                FROM   #MyTable AS mt2
                WHERE  mt2.Col_A = mt.Col_A
                -- GROUP BY mt2.Col_A 
            ) AS ca;

GROUP BY考虑到问题中提供的数据,该子句是多余的,但可以为您提供更好的执行计划。请参阅后续Q&A CROSS APPLY产生外部联接

如果希望将该功能添加到SQL Server,请考虑对OVER子句增强请求-DISTINCT子句投票,以对反馈站点上的聚合函数进行投票。


6

您可以使用来模拟它dense_rank,然后为每个分区选择最大等级:

select col_a, col_b, max(rnk) over (partition by col_a)
from (
    select col_a, col_b
        , dense_rank() over (partition by col_A order by col_b) as rnk 
    from #mytable
) as t    

您需要从中排除所有null col_b才能获得与相同的结果COUNT(DISTINCT)


6

从某种意义上说,这是对Lennart解决方案的扩展,但它是如此丑陋,以至于我不敢建议将其作为编辑。这里的目标是在没有派生表的情况下获得结果。可能永远都不需要这样做,再加上查询的丑陋性,整个工作似乎是一种浪费。不过,我仍然想以此为练习,现在想分享一下我的结果:

SELECT
  Col_A,
  Col_B,
  DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
                + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
                - 1
                - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
                  WHEN COUNT(  *  ) OVER (PARTITION BY Col_A)
                  THEN 0
                  ELSE 1
                  END
FROM
  dbo.MyTable
;

计算的核心部分是这个(我首先要指出的是,这个想法不是我的,我从其他地方学到了这个技巧):

  DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
+ DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
- 1

如果Col_B保证其中的值永远不会为空,则可以不加任何更改地使用此表达式。但是,如果该列可以包含空值,则需要解决这个问题,而这恰恰是CASE表达式的目的。它将每个分区的行数与每个分区的Col_B数进行比较。如果数字不同,则意味着某些行的值为空Col_B,因此,初始计算(DENSE_RANK() ... + DENSE_RANK() - 1)需要减少1。

请注意,由于- 1是核心公式的一部分,因此我选择保留该格式。但是,实际上可以将其合并到CASE表达式中,以徒劳地尝试使整个解决方案看起来不那么难看:

SELECT
  Col_A,
  Col_B,
  DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
                + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
                - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
                  WHEN COUNT(  *  ) OVER (PARTITION BY Col_A)
                  THEN 1
                  ELSE 2
                  END
FROM
  dbo.MyTable
;

此现场演示dbfiddle徽标分贝<> fiddle.uk可用于测试溶液的两个变型。


2
create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)


;with t1 as (

select t.Col_A,
       count(*) cnt
 from (
    select Col_A,
           Col_B,
           count(*) as ct
      from #MyTable
     group by Col_A,
              Col_B
  ) t
  group by t.Col_A
 )

select a.*,
       t1.cnt
  from #myTable a
  join t1
    on a.Col_A = t1.Col_a

1

如果您对像我这样的相关子查询(Erik Darling的答案)和CTE(kevinnwhat的答案)有轻微过敏,则可以选择。

请注意,当将null放入混合中时,这些都不起作用。 (但是修改它们的口味很简单)

简单的情况:

--ignore the existence of nulls
SELECT [mt].*, [Distinct_B].[Distinct_B]
FROM #MyTable AS [mt]

INNER JOIN(
    SELECT [Col_A], COUNT(DISTINCT [Col_B]) AS [Distinct_B]
    FROM #MyTable
    GROUP BY [Col_A]
) AS [Distinct_B] ON
    [mt].[Col_A] = [Distinct_B].[Col_A]
;

与上述相同,但带有针对空处理更改内容的注释:

--customizable null handling
SELECT [mt].*, [Distinct_B].[Distinct_B]
FROM #MyTable AS [mt]

INNER JOIN(
    SELECT 

    [Col_A],

    (
        COUNT(DISTINCT [Col_B])
        /*
        --uncomment if you also want to count Col_B NULL
        --as a distinct value
        +
        MAX(
            CASE
                WHEN [Col_B] IS NULL
                THEN 1
                ELSE 0
            END
        )
        */
    )
    AS [Distinct_B]

    FROM #MyTable
    GROUP BY [Col_A]
) AS [Distinct_B] ON
    [mt].[Col_A] = [Distinct_B].[Col_A]
/*
--uncomment if you also want to include Col_A when it's NULL
OR
([mt].[Col_A] IS NULL AND [Distinct_B].[Col_A] IS NULL)
*/
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.