如何在行之间递归查找90天过去的间隔


17

在我的C#家庭世界中,这是一项微不足道的任务,但是我还没有在SQL中实现它,而是希望解决基于集合的问题(没有游标)。结果集应来自这样的查询。

SELECT SomeId, MyDate, 
    dbo.udfLastHitRecursive(param1, param2, MyDate) as 'Qualifying'
FROM T

应该如何运作

我将这三个参数发送到UDF中。
UDF在内部使用参数从视图中获取相关的<= 90天的旧行。
UDF遍历“ MyDate”并返回1(如果应将其包括在总计计算中)。
如果不应该,则返回0。此处命名为“ qualifying”。

udf会做什么

按日期顺序列出行。计算行之间的天数。结果集中的第一行默认为Hit =1。如果差异最大为90,则-传递到下一行,直到间隔总和为90天(必须经过90天)。达到时,将Hit设置为1并将间隙重置为0代替从结果中删除行也将起作用。

                                          |(column by udf, which not work yet)
Date              Calc_date     MaxDiff   | Qualifying
2014-01-01 11:00  2014-01-01    0         | 1
2014-01-03 10:00  2014-01-01    2         | 0
2014-01-04 09:30  2014-01-03    1         | 0
2014-04-01 10:00  2014-01-04    87        | 0
2014-05-01 11:00  2014-04-01    30        | 1

在上表中,MaxDiff列是上一行中与日期的差距。到目前为止,我的尝试存在的问题是我不能忽略上面示例中的倒数第二行。

[编辑]
根据评论,我添加一个标签,还粘贴刚编译的udf。虽然,只是一个占位符,不会给出有用的结果。

;WITH cte (someid, otherkey, mydate, cost) AS
(
    SELECT someid, otherkey, mydate, cost
    FROM dbo.vGetVisits
    WHERE someid = @someid AND VisitCode = 3 AND otherkey = @otherkey 
    AND CONVERT(Date,mydate) = @VisitDate

    UNION ALL

    SELECT top 1 e.someid, e.otherkey, e.mydate, e.cost
    FROM dbo.vGetVisits AS E
    WHERE CONVERT(date, e.mydate) 
        BETWEEN DateAdd(dd,-90,CONVERT(Date,@VisitDate)) AND CONVERT(Date,@VisitDate)
        AND e.someid = @someid AND e.VisitCode = 3 AND e.otherkey = @otherkey 
        AND CONVERT(Date,e.mydate) = @VisitDate
        order by e.mydate
)

我有另一个查询,我单独定义了一个查询,该查询与我的查询更接近,但由于无法在窗口列上进行计算而被阻止。我还尝试了一种类似的方法,即在MyDate上使用LAG()或多或少地提供相同的输出,并用datediff包围。

SELECT
    t.Mydate, t.VisitCode, t.Cost, t.SomeId, t.otherkey, t.MaxDiff, t.DateDiff
FROM 
(
    SELECT *,
        MaxDiff = LAST_VALUE(Diff.Diff)  OVER (
            ORDER BY Diff.Mydate ASC
                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM 
    (
        SELECT *,
            Diff =  ISNULL(DATEDIFF(DAY, LAST_VALUE(r.Mydate) OVER (
                        ORDER BY r.Mydate ASC
                            ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), 
                                r.Mydate),0),
            DateDiff =  ISNULL(LAST_VALUE(r.Mydate) OVER (
                        ORDER BY r.Mydate ASC
                            ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), 
                                r.Mydate)
        FROM dbo.vGetVisits AS r
        WHERE r.VisitCode = 3 AND r.SomeId = @SomeID AND r.otherkey = @otherkey
    ) AS Diff
) AS t
WHERE t.VisitCode = 3 AND t.SomeId = @SomeId AND t.otherkey = @otherkey
    AND t.Diff <= 90
ORDER BY
    t.Mydate ASC;

评论不作进一步讨论;此对话已转移至聊天
保罗·怀特

Answers:


22

在我阅读问题时,所需的基本递归算法是:

  1. 返回集合中最早日期的行
  2. 将该日期设置为“当前”
  3. 查找最早日期超过当前日期90天的行
  4. 从步骤2重复,直到找不到更多行

使用递归公用表表达式相对容易实现。

例如,使用以下示例数据(基于问题):

DECLARE @T AS table (TheDate datetime PRIMARY KEY);

INSERT @T (TheDate)
VALUES
    ('2014-01-01 11:00'),
    ('2014-01-03 10:00'),
    ('2014-01-04 09:30'),
    ('2014-04-01 10:00'),
    ('2014-05-01 11:00'),
    ('2014-07-01 09:00'),
    ('2014-07-31 08:00');

递归代码为:

WITH CTE AS
(
    -- Anchor:
    -- Start with the earliest date in the table
    SELECT TOP (1)
        T.TheDate
    FROM @T AS T
    ORDER BY
        T.TheDate

    UNION ALL

    -- Recursive part   
    SELECT
        SQ1.TheDate
    FROM 
    (
        -- Recursively find the earliest date that is 
        -- more than 90 days after the "current" date
        -- and set the new date as "current".
        -- ROW_NUMBER + rn = 1 is a trick to get
        -- TOP in the recursive part of the CTE
        SELECT
            T.TheDate,
            rn = ROW_NUMBER() OVER (
                ORDER BY T.TheDate)
        FROM CTE
        JOIN @T AS T
            ON T.TheDate > DATEADD(DAY, 90, CTE.TheDate)
    ) AS SQ1
    WHERE
        SQ1.rn = 1
)
SELECT 
    CTE.TheDate 
FROM CTE
OPTION (MAXRECURSION 0);

结果是:

╔═════════════════════════╗
         TheDate         
╠═════════════════════════╣
 2014-01-01 11:00:00.000 
 2014-05-01 11:00:00.000 
 2014-07-31 08:00:00.000 
╚═════════════════════════╝

使用索引TheDate作为前导键,执行计划非常有效:

执行计划

您可以选择将其包装在一个函数中,然后直接针对问题中提到的视图执行它,但是我的直觉与此相反。通常,当您从视图中选择临时表中的行,在临时表上提供适当的索引,然后应用上面的逻辑时,性能会更好。详细信息取决于视图的详细信息,但这是我的一般经验。

为了完整性(并由ypercube的回答提示),我应该提到我针对此类问题的另一种解决方案(直到T-SQL获得适当的有序集合函数)是SQLCLR游标(有关该技术的示例,请参见此处答案))。这种执行比T-SQL游标好,方便那些在.NET语言技能,并在其生产环境中运行SQLCLR的能力。在这种情况下,与递归解决方案相比,它可能无法提供太多帮助,因为大部分成本是排序的,但值得一提。


9

由于这一个SQL Server 2014问题,因此我不妨添加一个本机编译的“游标”存储过程版本。

源表中包含一些数据:

create table T 
(
  TheDate datetime primary key
);

go

insert into T(TheDate) values
('2014-01-01 11:00'),
('2014-01-03 10:00'),
('2014-01-04 09:30'),
('2014-04-01 10:00'),
('2014-05-01 11:00'),
('2014-07-01 09:00'),
('2014-07-31 08:00');

一个表类型,它是存储过程的参数。适当调整bucket_count

create type TType as table
(
  ID int not null primary key nonclustered hash with (bucket_count = 16),
  TheDate datetime not null
) with (memory_optimized = on);

还有一个存储过程,它遍历表值参数并收集中的行@R

create procedure dbo.GetDates
  @T dbo.TType readonly
with native_compilation, schemabinding, execute as owner 
as
begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on)

  declare @R dbo.TType;
  declare @ID int = 0;
  declare @RowsLeft bit = 1;  
  declare @CurDate datetime = '1901-01-01';
  declare @LastDate datetime = '1901-01-01';

  while @RowsLeft = 1
  begin
    set @ID += 1;

    select @CurDate = T.TheDate
    from @T as T
    where T.ID = @ID

    if @@rowcount = 1
    begin
      if datediff(day, @LastDate, @CurDate) > 90
      begin
        insert into @R(ID, TheDate) values(@ID, @CurDate);
        set @LastDate = @CurDate;
      end;
    end
    else
    begin
      set @RowsLeft = 0;
    end

  end;

  select R.TheDate
  from @R as R;
end

填充内存优化表变量的代码,该变量用作本机编译的存储过程的参数并调用该过程。

declare @T dbo.TType;

insert into @T(ID, TheDate)
select row_number() over(order by T.TheDate),
       T.TheDate
from T;

exec dbo.GetDates @T;

结果:

TheDate
-----------------------
2014-07-31 08:00:00.000
2014-01-01 11:00:00.000
2014-05-01 11:00:00.000

更新:

如果您出于某种原因不需要访问表中的每一行,则可以执行与Paul White递归CTE中实现的“跳转到下一个日期”版本等效的操作。

数据类型不需要ID列,并且您不应使用哈希索引。

create type TType as table
(
  TheDate datetime not null primary key nonclustered
) with (memory_optimized = on);

存储过程使用a select top(1) ..查找下一个值。

create procedure dbo.GetDates
  @T dbo.TType readonly
with native_compilation, schemabinding, execute as owner 
as
begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on)

  declare @R dbo.TType;
  declare @RowsLeft bit = 1;  
  declare @CurDate datetime = '1901-01-01';

  while @RowsLeft = 1
  begin

    select top(1) @CurDate = T.TheDate
    from @T as T
    where T.TheDate > dateadd(day, 90, @CurDate)
    order by T.TheDate;

    if @@rowcount = 1
    begin
      insert into @R(TheDate) values(@CurDate);
    end
    else
    begin
      set @RowsLeft = 0;
    end

  end;

  select R.TheDate
  from @R as R;
end

根据初始数据集,使用DATEADD和DATEDIFF的解决方案可能会返回不同的结果。
Pavel Nefyodov 2014年

@PavelNefyodov我看不到。你能解释还是举个例子?
Mikael Eriksson 2014年

请问您是否可以在这样的日期('2014-01-01 00:00:00.000'),('2014-04-01 01:00:00.000')进行检查?更多信息可以在我的答案中找到。
Pavel Nefyodov

@PavelNefyodov啊,我明白了。因此,如果我将T.TheDate >= dateadd(day, 91, @CurDate)所有秒数更改都可以,对吗?
Mikael Eriksson 2014年

或者,如果适用于OP,请将TheDatein 的数据类型更改TTypeDate
Mikael Eriksson 2014年

5

使用游标的解决方案。
(首先,一些所需的表和变量)

-- a table to hold the results
DECLARE @cd TABLE
(   TheDate datetime PRIMARY KEY,
    Qualify INT NOT NULL
);

-- some variables
DECLARE
    @TheDate DATETIME,
    @diff INT,
    @Qualify     INT = 0,
    @PreviousCheckDate DATETIME = '1900-01-01 00:00:00' ;

实际的游标:

-- declare the cursor
DECLARE c CURSOR
    LOCAL STATIC FORWARD_ONLY READ_ONLY
    FOR
    SELECT TheDate
      FROM T
      ORDER BY TheDate ;

-- using the cursor to fill the @cd table
OPEN c ;

FETCH NEXT FROM c INTO @TheDate ;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @diff = DATEDIFF(day, @PreviousCheckDate, @Thedate) ;
    SET @Qualify = CASE WHEN @diff > 90 THEN 1 ELSE 0 END ;

    INSERT @cd (TheDate, Qualify)
        SELECT @TheDate, @Qualify ;

    SET @PreviousCheckDate = 
            CASE WHEN @diff > 90 
                THEN @TheDate 
                ELSE @PreviousCheckDate END ;

    FETCH NEXT FROM c INTO @TheDate ;
END

CLOSE c;
DEALLOCATE c;

并得到结果:

-- get the results
SELECT TheDate, Qualify
    FROM @cd
    -- WHERE Qualify = 1        -- optional, to see only the qualifying rows
    ORDER BY TheDate ;

SQLFiddle上测试


为此解决方案+1,但这并不是因为它是最有效的处理方式。
Pavel Nefyodov 2014年

@PavelNefyodov然后我们应该测试性能!
ypercubeᵀᴹ

我相信保罗·怀特。我在性能测试方面的经验并不令人印象深刻。同样,这并不能阻止我投票支持您的答案。
Pavel Nefyodov

谢谢你的ypercube。如预期的那样,在有限的行数上速度很快。在13000行中,CTE的执行效果大致相同。在130.000行上,相差600%。在13m上,在我的测试设备上经过了15分钟。另外,我确实必须删除了主键,这可能会稍微影响性能。
独立

Thnx进行测试。您也可以通过修改为INSERT @cd仅在以下情况下进行操作进行测试@Qualify=1(因此,如果不需要在输出中全部使用它们,则不插入1300万行)。解决方案取决于在上找到索引TheDate。如果没有,那将是无效的。
ypercubeᵀᴹ

2
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[vGetVisits]') AND type in (N'U'))
DROP TABLE [dbo].[vGetVisits]
GO

CREATE TABLE [dbo].[vGetVisits](
    [id] [int] NOT NULL,
    [mydate] [datetime] NOT NULL,
 CONSTRAINT [PK_vGetVisits] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)
)

GO

INSERT INTO [dbo].[vGetVisits]([id], [mydate])
VALUES
    (1, '2014-01-01 11:00'),
    (2, '2014-01-03 10:00'),
    (3, '2014-01-04 09:30'),
    (4, '2014-04-01 10:00'),
    (5, '2014-05-01 11:00'),
    (6, '2014-07-01 09:00'),
    (7, '2014-07-31 08:00');
GO


-- Clean up 
IF OBJECT_ID (N'dbo.udfLastHitRecursive', N'FN') IS NOT NULL
DROP FUNCTION udfLastHitRecursive;
GO

-- Actual Function  
CREATE FUNCTION dbo.udfLastHitRecursive
( @MyDate datetime)

RETURNS TINYINT

AS
    BEGIN 
        -- Your returned value 1 or 0
        DECLARE @Returned_Value TINYINT;
        SET @Returned_Value=0;
    -- Prepare gaps table to be used.
    WITH gaps AS
    (
                        -- Select Date and MaxDiff from the original table
                        SELECT 
                        CONVERT(Date,mydate) AS [date]
                        , DATEDIFF(day,ISNULL(LAG(mydate, 1) OVER (ORDER BY mydate), mydate) , mydate) AS [MaxDiff]
                        FROM dbo.vGetVisits
    )

        SELECT @Returned_Value=
            (SELECT DISTINCT -- DISTINCT in case we have same date but different time
                    CASE WHEN
                     (
                    -- It is a first entry
                    [date]=(SELECT MIN(CONVERT(Date,mydate)) FROM dbo.vGetVisits))
                    OR 
                    /* 
                    --Gap between last qualifying date and entered is greater than 90 
                        Calculate Running sum upto and including required date 
                        and find a remainder of division by 91. 
                    */
                     ((SELECT SUM(t1.MaxDiff)  
                    FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<=t2.[date] 
                    ) t1 
                    )%91 - 
                    /* 
                        ISNULL added to include first value that always returns NULL 
                        Calculate Running sum upto and NOT including required date 
                        and find a remainder of division by 91 
                    */
                    ISNULL((SELECT SUM(t1.MaxDiff)  
                    FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<t2.[date] 
                    ) t1 
                    )%91, 0) -- End ISNULL
                     <0 )
                    /* End Running sum upto and including required date */
                    OR
                    -- Gap between two nearest dates is greater than 90 
                    ((SELECT SUM(t1.MaxDiff)  
                    FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<=t2.[date] 
                    ) t1 
                    ) - ISNULL((SELECT SUM(t1.MaxDiff)  
                    FROM (SELECT [MaxDiff] FROM gaps WHERE [date]<t2.[date] 
                    ) t1 
                    ), 0) > 90) 
                    THEN 1
                    ELSE 0
                    END 
                    AS [Qualifying]
                    FROM gaps t2
                    WHERE [date]=CONVERT(Date,@MyDate))
        -- What is neccesary to return when entered date is not in dbo.vGetVisits?
        RETURN @Returned_Value
    END
GO

SELECT 
dbo.udfLastHitRecursive(mydate) AS [Qualifying]
, [id]
, mydate 
FROM dbo.vGetVisits
ORDER BY mydate 

结果

在此处输入图片说明

也看看如何在SQL Server中计算运行总计

更新:请参阅下面的性能测试结果。

由于查找“ 90天差距”时使用的逻辑不同,ypercube和我的解决方案如果完好无损,可能会给Paul White的解决方案返回不同的结果。这是由于分别使用了DATEDIFFDATEADD函数。

例如:

SELECT DATEADD(DAY, 90, '2014-01-01 00:00:00.000')

返回'2014-04-01 00:00:00.000'表示'2014-04-01 01:00:00.000'超出了90天的间隔

SELECT DATEDIFF(DAY, '2014-01-01 00:00:00.000', '2014-04-01 01:00:00.000')

返回“ 90”表示它仍在间隙内。

考虑一个零售商的例子。在这种情况下,销售在“ 2014-01-01 23:59:59:999”之前已在日期“ 2014-01-01”出售的易腐产品是可以的。因此,在这种情况下,值DATEDIFF(DAY,...)是可以的。

另一个例子是等待观察的病人。对于来到'2014-01-01 00:00:00:000'并离开'2014-01-01 23:59:59:999'的人,即使使用了DATEDIFF,也将是0(零)天实际等待时间将近24小时。如果使用DATEDIFF,则再次出现在'2014-01-01 23:59:59'并在'2014-01-02 00:00:01'走开的患者等待了一天。

但是我离题了。

我离开了DATEDIFF解决方案,甚至对性能进行了测试,但它们确实应该属于自己的联盟​​。

还应注意,对于大型数据集,无法避免当天的值。因此,如果说有1300万条记录涵盖了2年的数据,那么几天后我们将拥有不止一条记录。这些记录会在我和ypercube的DATEDIFF解决方案中尽早被过滤掉。希望ypercube不介意这一点。

解决方案在下表中进行了测试

CREATE TABLE [dbo].[vGetVisits](
    [id] [int] NOT NULL,
    [mydate] [datetime] NOT NULL,
) 

具有两个不同的聚集索引(在本例中为mydate):

CREATE CLUSTERED INDEX CI_mydate on vGetVisits(mydate) 
GO

通过以下方式填充表格

SET NOCOUNT ON
GO

INSERT INTO dbo.vGetVisits(id, mydate)
VALUES (1, '01/01/1800')
GO

DECLARE @i bigint
SET @i=2

DECLARE @MaxRows bigint
SET @MaxRows=13001

WHILE @i<@MaxRows 
BEGIN
INSERT INTO dbo.vGetVisits(id, mydate)
VALUES (@i, DATEADD(day,FLOOR(RAND()*(3)),(SELECT MAX(mydate) FROM dbo.vGetVisits)))
SET @i=@i+1
END

对于数百万行的情况,INSERT的更改方式是随机添加0-20分钟的条目。

下面的代码仔细包装了所有解决方案

SET NOCOUNT ON
GO

DECLARE @StartDate DATETIME

SET @StartDate = GETDATE()

--- Code goes here

PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))

测试的实际代码(无特定顺序):

Ypercube的DATEDIFF解决方案(YPC,DATEDIFF

DECLARE @cd TABLE
(   TheDate datetime PRIMARY KEY,
    Qualify INT NOT NULL
);

DECLARE
    @TheDate DATETIME,
    @Qualify     INT = 0,
    @PreviousCheckDate DATETIME = '1799-01-01 00:00:00' 


DECLARE c CURSOR
    LOCAL STATIC FORWARD_ONLY READ_ONLY
    FOR
SELECT 
   mydate
FROM 
 (SELECT
       RowNum = ROW_NUMBER() OVER(PARTITION BY cast(mydate as date) ORDER BY mydate)
       , mydate
   FROM 
       dbo.vGetVisits) Actions
WHERE
   RowNum = 1
ORDER BY 
  mydate;

OPEN c ;

FETCH NEXT FROM c INTO @TheDate ;

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @Qualify = CASE WHEN DATEDIFF(day, @PreviousCheckDate, @Thedate) > 90 THEN 1 ELSE 0 END ;
    IF  @Qualify=1
    BEGIN
        INSERT @cd (TheDate, Qualify)
        SELECT @TheDate, @Qualify ;
        SET @PreviousCheckDate=@TheDate 
    END
    FETCH NEXT FROM c INTO @TheDate ;
END

CLOSE c;
DEALLOCATE c;


SELECT TheDate
    FROM @cd
    ORDER BY TheDate ;

Ypercube的DATEADD解决方案(YPC,DATEADD

DECLARE @cd TABLE
(   TheDate datetime PRIMARY KEY,
    Qualify INT NOT NULL
);

DECLARE
    @TheDate DATETIME,
    @Next_Date DATETIME,
    @Interesting_Date DATETIME,
    @Qualify     INT = 0

DECLARE c CURSOR
    LOCAL STATIC FORWARD_ONLY READ_ONLY
    FOR
  SELECT 
  [mydate]
  FROM [test].[dbo].[vGetVisits]
  ORDER BY mydate
  ;

OPEN c ;

FETCH NEXT FROM c INTO @TheDate ;

SET @Interesting_Date=@TheDate

INSERT @cd (TheDate, Qualify)
SELECT @TheDate, @Qualify ;

WHILE @@FETCH_STATUS = 0
BEGIN

    IF @TheDate>DATEADD(DAY, 90, @Interesting_Date)
    BEGIN
        INSERT @cd (TheDate, Qualify)
        SELECT @TheDate, @Qualify ;
        SET @Interesting_Date=@TheDate;
    END

    FETCH NEXT FROM c INTO @TheDate;
END

CLOSE c;
DEALLOCATE c;


SELECT TheDate
    FROM @cd
    ORDER BY TheDate ;

保罗·怀特(PW)的解决方案

;WITH CTE AS
(
    SELECT TOP (1)
        T.[mydate]
    FROM dbo.vGetVisits AS T
    ORDER BY
        T.[mydate]

    UNION ALL

    SELECT
        SQ1.[mydate]
    FROM 
    (
        SELECT
            T.[mydate],
            rn = ROW_NUMBER() OVER (
                ORDER BY T.[mydate])
        FROM CTE
        JOIN dbo.vGetVisits AS T
            ON T.[mydate] > DATEADD(DAY, 90, CTE.[mydate])
    ) AS SQ1
    WHERE
        SQ1.rn = 1
)

SELECT 
    CTE.[mydate]
FROM CTE
OPTION (MAXRECURSION 0);

我的DATEADD解决方案(PN,DATEADD

DECLARE @cd TABLE
(   TheDate datetime PRIMARY KEY
);

DECLARE @TheDate DATETIME

SET @TheDate=(SELECT MIN(mydate) as mydate FROM [dbo].[vGetVisits])

WHILE (@TheDate IS NOT NULL)
    BEGIN

        INSERT @cd (TheDate) SELECT @TheDate;

        SET @TheDate=(  
            SELECT MIN(mydate) as mydate 
            FROM [dbo].[vGetVisits]
            WHERE mydate>DATEADD(DAY, 90, @TheDate)
                    )
    END

SELECT TheDate
    FROM @cd
    ORDER BY TheDate ;

我的DATEDIFF解决方案(PN,DATEDIFF

DECLARE @MinDate DATETIME;
SET @MinDate=(SELECT MIN(mydate) FROM dbo.vGetVisits);
    ;WITH gaps AS
    (
       SELECT 
       t1.[date]
       , t1.[MaxDiff]
       , SUM(t1.[MaxDiff]) OVER (ORDER BY t1.[date]) AS [Running Total]
            FROM
            (
                SELECT 
                mydate AS [date]
                , DATEDIFF(day,LAG(mydate, 1, mydate) OVER (ORDER BY mydate) , mydate) AS [MaxDiff] 
                FROM 
                    (SELECT
                    RowNum = ROW_NUMBER() OVER(PARTITION BY cast(mydate as date) ORDER BY mydate)
                    , mydate
                    FROM dbo.vGetVisits
                    ) Actions
                WHERE RowNum = 1
            ) t1
    )

    SELECT [date]
    FROM gaps t2
    WHERE                         
         ( ([Running Total])%91 - ([Running Total]- [MaxDiff])%91 <0 )      
         OR
         ( [MaxDiff] > 90) 
         OR
         ([date]=@MinDate)    
    ORDER BY [date]

我使用的是SQL Server 2012,因此对Mikael Eriksson表示歉意,但此处不会对其代码进行测试。我仍然希望他的DATADIFF和DATEADD解决方案在某些数据集上返回不同的值。

实际结果是: 在此处输入图片说明


谢谢帕维尔。我真的没有及时得到您解决方案的结果。我将测试数据缩小到1000行,直到有25秒的执行时间。当我按日期添加组并在选择中转换为日期时,我得到了正确的输出!只是为了方便起见,我让查询继续执行我的小型testdata-table(13k行),并超过了12分钟,这意味着性能要比o(nx)高!因此,对于肯定会很小的集合来说,它看起来很有用。
2014年

您在测试中使用的表是什么?多少行?不知道为什么您必须按日期添加组才能获得正确的输出。请随时发布您的资金,作为问题的一部分(已更新)。
Pavel Nefyodov

嗨!我明天再补充。分组依据是合并重复的日期。但是我很着急(深夜),也许已经通过添加convert(date,z)完成了。行的数量在我的评论中。我用您的解决方案尝试了1000行。还尝试了12分钟执行时间的13.000行。Pauls和Ypercubes也被吸引到130.000和1300万的赌注中。该表是一个普通表,带有从昨天和-2年以前创建的随机日期。日期字段上的聚簇索引。
2014年

0

好吧,我错过了什么吗?或者为什么您不跳过递归并重新加入自己的行列呢?如果日期是主键,则它必须是唯一的,并且如果您计划计算到下一行的偏移量,则必须按时间顺序排列

    DECLARE @T AS TABLE
  (
     TheDate DATETIME PRIMARY KEY
  );

INSERT @T
       (TheDate)
VALUES ('2014-01-01 11:00'),
       ('2014-01-03 10:00'),
       ('2014-01-04 09:30'),
       ('2014-04-01 10:00'),
       ('2014-05-01 11:00'),
       ('2014-07-01 09:00'),
       ('2014-07-31 08:00');

SELECT [T1].[TheDate]                               [first],
       [T2].[TheDate]                               [next],
       Datediff(day, [T1].[TheDate], [T2].[TheDate])[offset],
       ( CASE
           WHEN Datediff(day, [T1].[TheDate], [T2].[TheDate]) >= 30 THEN 1
           ELSE 0
         END )                                      [qualify]
FROM   @T[T1]
       LEFT JOIN @T[T2]
              ON [T2].[TheDate] = (SELECT Min([TheDate])
                                   FROM   @T
                                   WHERE  [TheDate] > [T1].[TheDate]) 

产量

在此处输入图片说明

除非我完全错过了重要的事情...


2
您可能要更改此设置,WHERE [TheDate] > [T1].[TheDate]以考虑90天差异阈值。但是,您的输出仍然不是想要的。
ypercubeᵀᴹ

重要说明:您的代码应在某处带有“ 90”。
Pavel Nefyodov
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.