尝试查找值最后一次更改的时间


26

我有一个具有ID,值和日期的表。该表中有许多ID,值和日期。

记录会定期插入此表中。ID将始终保持不变,但有时值会更改。

如何编写一个查询,该查询将为我提供ID以及最近一次更改值的时间?注意:该值将始终增加。

从此样本数据中:

  Create Table Taco
 (  Taco_ID int,
    Taco_value int,
    Taco_date datetime)

Insert INTO Taco 
Values (1, 1, '2012-07-01 00:00:01'),
        (1, 1, '2012-07-01 00:00:02'),
        (1, 1, '2012-07-01 00:00:03'),
        (1, 1, '2012-07-01 00:00:04'),
        (1, 2, '2012-07-01 00:00:05'),
        (1, 2, '2012-07-01 00:00:06'),
        (1, 2, '2012-07-01 00:00:07'),
        (1, 2, '2012-07-01 00:00:08')

结果应为:

Taco_ID      Taco_date
1            2012-07-01 00:00:05

(因为00:05是最后一次Taco_Value更改。)


2
我认为taco与食物无关吗?
Kermit 2013年

5
我饿了,想吃一些炸玉米饼。只需为样品表命名。
SqlSandwiches

8
您是否以类似的方式选择了用户名?
马丁·史密斯

1
很有可能。
SqlSandwiches,2013年

Answers:


13

这两个查询基于Taco_value始终随时间增加的假设。

;WITH x AS
(
  SELECT Taco_ID, Taco_date,
    dr = ROW_NUMBER() OVER (PARTITION BY Taco_ID, Taco_Value ORDER BY Taco_date),
    qr = ROW_NUMBER() OVER (PARTITION BY Taco_ID ORDER BY Taco_date)
  FROM dbo.Taco
), y AS
(
  SELECT Taco_ID, Taco_date,
    rn = ROW_NUMBER() OVER (PARTITION BY Taco_ID, dr ORDER BY qr DESC)
  FROM x WHERE dr = 1
)
SELECT Taco_ID, Taco_date
FROM y 
WHERE rn = 1;

窗口功能异常少的替代方法:

;WITH x AS
(
  SELECT Taco_ID, Taco_value, Taco_date = MIN(Taco_date)
  FROM dbo.Taco
  GROUP BY Taco_ID, Taco_value
), y AS
(
  SELECT Taco_ID, Taco_date, 
    rn = ROW_NUMBER() OVER (PARTITION BY Taco_ID ORDER BY Taco_date DESC)
  FROM x
)
SELECT Taco_ID, Taco_date FROM y WHERE rn = 1;

SQLfiddle上的示例


更新资料

对于那些保持跟踪的人,如果Taco_value可以重复会发生什么会引起争议。如果对于任何给定它可以从1变为2,然后又返回1 Taco_ID,则查询将不起作用。对于这种情况,这是一个解决方案,即使它不是Itzik Ben-Gan之类的人可能能够梦dream以求的间隙和孤岛技术,即使它与OP的情况无关,也可能是与未来的读者有关。这有点复杂,我还添加了一个附加变量-一个Taco_ID只有一个Taco_value

如果要在整个集合中值完全不变的任何ID中包含第一行:

;WITH x AS
(
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY Taco_ID ORDER BY Taco_date DESC)
  FROM dbo.Taco
), rest AS (SELECT * FROM x WHERE rn > 1)
SELECT  
  main.Taco_ID, 
  Taco_date = MIN(CASE 
    WHEN main.Taco_value = rest.Taco_value 
    THEN rest.Taco_date ELSE main.Taco_date 
  END)
FROM x AS main LEFT OUTER JOIN rest
ON main.Taco_ID = rest.Taco_ID AND rest.rn > 1
WHERE main.rn = 1
AND NOT EXISTS 
(
  SELECT 1 FROM rest AS rest2
   WHERE Taco_ID = rest.Taco_ID
   AND rn < rest.rn
   AND Taco_value <> rest.Taco_value
) 
GROUP BY main.Taco_ID;

如果要排除这些行,则要复杂一些,但仍需进行一些较小的更改:

;WITH x AS
(
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY Taco_ID ORDER BY Taco_date DESC)
  FROM dbo.Taco
), rest AS (SELECT * FROM x WHERE rn > 1)
SELECT 
  main.Taco_ID, 
  Taco_date = MIN(
  CASE 
    WHEN main.Taco_value = rest.Taco_value 
    THEN rest.Taco_date ELSE main.Taco_date 
  END)
FROM x AS main INNER JOIN rest -- ***** change this to INNER JOIN *****
ON main.Taco_ID = rest.Taco_ID AND rest.rn > 1
WHERE main.rn = 1
AND NOT EXISTS
(
  SELECT 1 FROM rest AS rest2
   WHERE Taco_ID = rest.Taco_ID
   AND rn < rest.rn
   AND Taco_value <> rest.Taco_value
)
AND EXISTS -- ***** add this EXISTS clause ***** 
(
  SELECT 1 FROM rest AS rest2
   WHERE Taco_ID = rest.Taco_ID
   AND Taco_value <> rest.Taco_value
)
GROUP BY main.Taco_ID;

更新了SQLfiddle示例


我注意到OVER存在一些重大的性能问题,但仅使用了几次,可能编写得很差。你有没有注意到?
肯尼斯·费舍尔

1
@KennethFisher不专门用于OVER。像其他任何东西一样,查询构造在很大程度上依赖于基础架构/索引才能正常工作。分区的over子句将遭受与GROUP BY相同的问题。
亚伦·伯特兰

@KennethFisher请注意不要从奇异,孤立的观察中得出广泛,笼统的结论。我对CTE持相同的观点-“嗯,我曾经有一次递归CTE,它的性能糟透了。因此,我不再使用CTE。”
亚伦·伯特兰

这就是为什么我问。我还没有足够用它来表达一种或另一种方式,但是我使用它几次却能够通过CTE获得更好的性能。我会继续玩下去。
肯尼斯·费舍尔

@AaronBertrand我不认为如果这些工作将value再次出现:小提琴
ypercubeᵀᴹ

13

基本上,这是@Taryn的建议 “压缩”为没有派生表的单个SELECT:

SELECT DISTINCT
  Taco_ID,
  Taco_date = MAX(MIN(Taco_date)) OVER (PARTITION BY Taco_ID)
FROM Taco
GROUP BY
  Taco_ID,
  Taco_value
;

注意:该解决方案考虑了Taco_value只能增加的规定。(更确切地说,它假设Taco_value不能更改回先前的值,实际上与链接的答案相同。)

用于查询的SQL Fiddle演示:http ://sqlfiddle.com/#!3/91368/2


7
哇,嵌套MAX / MIN。MIND BLOWN +1
Aaron Bertrand

7

您应该能够同时使用min()max()聚合函数来获得结果:

select t1.Taco_ID, MAX(t1.taco_date) Taco_Date
from taco t1
inner join
(
    select MIN(taco_date) taco_date,
        Taco_ID, Taco_value
    from Taco
    group by Taco_ID, Taco_value
) t2
    on t1.Taco_ID = t2.Taco_ID
    and t1.Taco_date = t2.taco_date
group by t1.Taco_Id

参见带有演示的SQL Fiddle


5

另一个答案是基于以下假设:这些值不会再次出现(这基本上是@Aaron的查询2,压缩在少一个嵌套中):

;WITH x AS
(
  SELECT 
    Taco_ID, Taco_value, 
    Rn = ROW_NUMBER() OVER (PARTITION BY Taco_ID
                            ORDER BY MIN(Taco_date) DESC),
    Taco_date = MIN(Taco_date) 
  FROM dbo.Taco
  GROUP BY Taco_ID, Taco_value
)
SELECT Taco_ID, Taco_value, Taco_date
FROM x 
WHERE Rn = 1 ;

在以下位置进行测试:SQL小提琴


对于更普遍的问题的答案是,值可能再次出现:

;WITH x AS
(
  SELECT 
    Taco_ID, Taco_value, 
    Rn = ROW_NUMBER() OVER (PARTITION BY Taco_ID
                            ORDER BY MAX(Taco_date) DESC),    
    Taco_date = MAX(Taco_date) 
  FROM dbo.Taco
  GROUP BY Taco_ID, Taco_value
)
SELECT t.Taco_ID, Taco_date = MIN(t.Taco_date)
FROM x
  JOIN dbo.Taco t
    ON  t.Taco_ID = x.Taco_ID
    AND t.Taco_date > x.Taco_date
WHERE x.Rn = 2 
GROUP BY t.Taco_ID ;

(或CROSS APPLY如此显示所有相关行,包括value):

;WITH x AS
(
  SELECT 
    Taco_ID, Taco_value, 
    Rn = ROW_NUMBER() OVER (PARTITION BY Taco_ID
                            ORDER BY MAX(Taco_date) DESC),    
    Taco_date = MAX(Taco_date) 
  FROM dbo.Taco
  GROUP BY Taco_ID, Taco_value
)
SELECT t.*
FROM x
  CROSS APPLY 
  ( SELECT TOP (1) *
    FROM dbo.Taco t
    WHERE t.Taco_ID = x.Taco_ID
      AND t.Taco_date > x.Taco_date
    ORDER BY t.Taco_date
  ) t
WHERE x.Rn = 2 ;

在以下位置进行测试:SQL-Fiddle-2


对于更普遍的问题的建议不适用于没有更改的ID。可以通过在原始集中添加虚拟条目来解决(如dbo.Taco UNION ALL SELECT DISTINCT Taco_ID, NULL AS Taco_value, '19000101' AS Taco_date)。
Andriy M

@AndriyM我知道。我认为“变化”的意思,他们想要的结果时,至少有2个值,该OP没有澄清,(因为它更容易写:)
ypercubeᵀᴹ

2

FYI +1,用于提供样本结构和数据。我唯一想要的就是该数据的预期输出。

编辑:这将使我发疯。我只是新手,这是一种“简单”的方法。我摆脱了不正确的解决方案,并提出了一个我认为是正确的解决方案。这是一个类似于@bluefeets的解决方案,但涵盖了@AaronBertrand进行的测试。

;WITH TacoMin AS (SELECT Taco_ID, Taco_value, MIN(Taco_date) InitialValueDate
                FROM Taco
                GROUP BY Taco_ID, Taco_value)
SELECT Taco_ID, MAX(InitialValueDate)
FROM TacoMin
GROUP BY Taco_ID

2
OP不要求提供更新日期,而是询问何时进行value更改。
ypercubeᵀᴹ

啊,我明白了我的错误。我得出了一个答案,但这与@Aaron的答案几乎相同,因此发布它毫无意义。
肯尼斯·费舍尔

1

为什么不仅仅获得滞后值和提前值之差?如果差异为零,则表示它没有变化,但非零则表示它已经改变。这可以通过一个简单的查询来完成:

-- example gives the times the value changed in the last 24 hrs
SELECT
    LastUpdated, [DiffValue]
FROM (
  SELECT
      LastUpdated,
      a.AboveBurdenProbe1TempC - coalesce(lag(a.AboveBurdenProbe1TempC) over (order by ProcessHistoryId), 0) as [DiffValue]
  FROM BFProcessHistory a
  WHERE LastUpdated > getdate() - 1
) b
WHERE [DiffValue] <> 0
ORDER BY LastUpdated ASC

lag...分析功能只是“近期”在SQL Server 2012中引入的原题是要求在SQL Server 2008 R2的解决方案。您的解决方案不适用于SQL Server 2008 R2。
约翰又名hot2use

-1

可以像下面这样简单吗?

       SELECT taco_id, MAX(
             CASE 
                 WHEN taco_value <> MAX(taco_value) 
                 THEN taco_date 
                 ELSE null 
             END) AS last_change_date

鉴于taco_value总是增加?

ps我本人还是SQL的初学者,但是学习肯定很慢。


1
在SQL Server上,这会给出错误。Cannot perform an aggregate function on an expression containing an aggregate or a subquery
马丁·史密斯

2
在Martin的评论中添加一点:如果您只发布经过测试的代码,则表示安全。如果您不在通常的操场上,可以使用一种简单的方法访问sqlfiddle.com
dezso
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.