如何选择组中每列的最后一个非NULL值集?


9

我正在使用SQL Server 2016,并且正在使用的数据具有以下形式。

CREATE TABLE #tab (cat CHAR(1), t CHAR(2), val1 INT, val2 CHAR(1));

INSERT INTO #tab VALUES 
    ('A','Q1',2,NULL),('A','Q2',NULL,'P'),('A','Q3',1,NULL),('A','Q3',NULL,NULL),
    ('B','Q1',5,NULL),('B','Q2',NULL,'P'),('B','Q3',NULL,'C'),('B','Q3',10,NULL);

SELECT *
FROM    #tab;

在此处输入图片说明

我想要获取列上最后的非空值,val1并按进行val2分组cat和排序t。我正在寻找的结果是

cat  val1 val2
A    1    P
B    10   C

我所使用的最接近的是正在使用的,LAST_VALUE而忽略了ORDER BY那将不起作用,因为我需要排序的最后一个非空值。

SELECT DISTINCT 
        cat, 
        LAST_VALUE(val1) OVER(PARTITION BY cat ORDER BY (SELECT NULL) ) AS val1,
        LAST_VALUE(val2) OVER(PARTITION BY cat ORDER BY (SELECT NULL) ) AS val2
FROM    #tab
cat  val1 val2
A    NULL NULL
B    10   NULL

实际表具有更多的列cat(日期和字符串列)和更多的val列(日期,字符串和数字列)以选择最后一个非空值。

任何想法如何进行此选择。


1
@Vérace按cat排序t
Edmund

1
@ypercubeᵀᴹ不,没有缺失的Q4值,这些t值重复出现。它不是行为良好的数据。
Edmund

4
好的,但是在那种情况下,您必须提供确定完美订购的订单。PARTITION BY cat ORDER BY t, id例如。否则,同一查询(任何查询)在单独执行时可能会给您不同的结果。如果表中的列仅是您显示的列,那么我看不到如何确定订单!
ypercubeᵀᴹ

1
@ypercubeᵀᴹ挑战就在其中。数据中没有id列。有多个分组列,一个可用于组排序的字符串列,然后是多个散有null的值列。
Edmund

1
如果您不能确定地告诉SQL Server行应该是什么顺序,那么该数据的任何使用者如何知道差异?
亚伦·伯特兰

Answers:


10

使用Itzik Ben Gan的《The Last non NULL Puzzle》的串联技术,您的示例表和列数据类型将看起来像这样。

select T.cat,
       cast(substring(
                     max(cast(T.t as binary(2)) + cast(T.val1 as binary(4))),
                     3,
                     4
                     ) as int),
       cast(substring(
                     max(cast(T.t as binary(2)) + cast(T.val2 as binary(1))),
                     3,
                     1
                     ) as char(1))
from #tab as T
group by T.cat;

在此处输入图片说明

编写此查询的另一种方法将步骤分为CTE,以更好地显示正在发生的事情。它提供与上述查询完全相同的执行计划。

with C1 as
(
  -- Concatenate the ordering column with the value column
  select T.cat,
        cast(T.t as binary(2)) + cast(T.val1 as binary(4)) as val1,
        cast(T.t as binary(2)) + cast(T.val2 as binary(1)) as val2
  from #tab as T
),
C2 as
(
  -- Get the max concatenated value per group
  select C1.cat,
         max(C1.val1) as val1,
         max(C1.val2) as val2
  from C1
  group by C1.cat
)
-- Extract the value from the concatenated column
select C2.cat,
       cast(substring(C2.val1, 3, 4) as int) as val1,
       cast(substring(C2.val2, 3, 1) as char(1)) as val2
from C2;

该解决方案利用了以下事实:将空值与某些东西连接会导致空值。SET CONCAT_NULL_YIELDS_NULL(Transact-SQL)


很好蒸馏的Mikael。这种解决方案为我节省了很多时间,尽管我一开始发现Itzik文章的结尾令人困惑。他将其标记为“第2步”,而实际上更像是实施第1步的逻辑
。– pimbrouwers

2

只需在分区中添加NULL检查就可以了

SELECT DISTINCT 
        cat, 
        FIRST_VALUE(val1) OVER(PARTITION BY cat ORDER BY CASE WHEN val1 is NULL then 0 else 1 END DESC, t desc) AS val1,
        FIRST_VALUE(val2) OVER(PARTITION BY cat ORDER BY CASE WHEN val2 is NULL then 0 else 1 END DESC, t desc) AS val2
FROM    #tab

0

这应该做。row_number()和一个联接

如果您的排序不好,则必须希望Q3中只有一个不为null。

declare @t TABLE (cat CHAR(1), t CHAR(2), val1 INT, val2 CHAR(1));
INSERT INTO @t VALUES 
    ('A','Q1',2,NULL),('A','Q2',NULL,'P'),('A','Q3',1,NULL),('A','Q3',NULL,NULL),
    ('B','Q1',5,NULL),('B','Q2',NULL,'P'),('B','Q3',NULL,'C'),('B','Q3',10,NULL);

--SELECT *
--     , row_number() over (partition by cat order by t) as rn
--FROM   @t
--where val1 is not null or val2 is not null;

select t1.cat, t1.val1, t2.val2 
from  ( SELECT t.cat, t.val1
             , row_number() over (partition by cat order by t desc) as rn
        FROM   @t t
        where val1 is not null 
       ) t1
join   ( SELECT t.cat, t.val2
             , row_number() over (partition by cat order by t desc) as rn
        FROM   @t t
        where val2 is not null 
       ) t2
   on t1.cat = t2.cat
  and t1.rn = 1
  and t2.rn = 1
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.