查询以根据时间重叠确定开始和结束日期


8

给定以下数据:

id      |   user_id |   started             |   closed              |   dead
-------------------------------------------------------------------------------------------
7714    |   238846  |   2015-01-27 15:14:50 |   2015-02-02 14:14:13 |   NULL
7882    |   238846  |   2015-01-28 13:25:58 |   NULL                |   2015-05-15 12:16:07
13190   |   259140  |   2015-03-17 10:11:44 |   NULL                |   2015-03-18 07:31:57
13192   |   259140  |   2015-03-17 10:12:17 |   NULL                |   2015-03-18 11:46:46
13194   |   259140  |   2015-03-17 10:12:53 |   NULL                |   2015-03-18 11:46:36
14020   |   259140  |   2015-03-23 14:32:16 |   2015-03-24 15:57:32 |   NULL
17124   |   242650  |   2015-04-16 16:19:08 |   2015-04-16 16:21:06 |   NULL
19690   |   238846  |   2015-05-15 13:17:31 |   NULL                |   2015-05-27 13:56:43
20038   |   242650  |   2015-05-19 15:38:17 |   NULL                |   NULL
20040   |   242650  |   2015-05-19 15:39:58 |   NULL                |   2015-05-21 12:01:02
20302   |   242650  |   2015-05-21 13:09:06 |   NULL                |   NULL
20304   |   242650  |   2015-05-21 13:09:54 |   NULL                |   NULL
20306   |   242650  |   2015-05-21 13:10:19 |   NULL                |   NULL
20308   |   242650  |   2015-05-21 13:12:20 |   NULL                |   NULL
21202   |   238846  |   2015-05-29 16:47:29 |   NULL                |   NULL
21204   |   238846  |   2015-05-29 16:47:56 |   NULL                |   NULL
21208   |   238846  |   2015-05-29 17:05:15 |   NULL                |   NULL
21210   |   238846  |   2015-05-29 17:05:55 |   NULL                |   NULL
21918   |   242650  |   2015-06-04 17:04:29 |   NULL                |   2015-06-12 15:47:23

我需要建立一个符合以下规则的数据集:

  1. 组是首先定义的,user_id因此我们应该只比较同一记录user_id
  2. 所有至少在其他任何记录开始,关闭或失效的15天内开始的记录都应计为组。
  3. 在每个组中,结束时间应计算为关闭的第一个记录,或者所有记录的值都为死值,我们采用死列的最大日期。
  4. 如果记录在另一个组开始或结束后的15天内未开始,则它将开始新的分组。

暂时,我相信我的数据应如下所示:

user_id | 开始 结束
-------------------------------------------------- ----
238846 | 2015-01-27 15:14:50 | 2015-02-02 14:14:13
259140 | 2015-03-23 14:32:16 | 2015-03-24 15:57:32
242650 | 2015-04-16 16:19:08 | 2015-04-16 16:21:06
242650 | 2015-05-21 13:09:06 | 空值
238846 | 2015-05-15 13:17:31 | 空值

谁能提供有关如何构建满足这些条件的查询的指导?

这是此问题中呈现的数据的DDL和DML语句链接

或者,我们可以跳过规则#2和#4,更简单地说,仅应包括彼此重叠的记录。更为重要的规则是,在给定的集合中,如果有一个关闭日期,则它将成为集合的结束日期,而不是最大的无效日期。


使用架构更改会更容易。不需要两列,即封闭的和固定的。只是有一个“结束”列,然后是结束的原因。
安德鲁·布伦南

你的第3个例子可以被编码为“如果一个ID为‘封闭’,那么它是一组本身就是因为似乎没有突出你所有的规则,请添加更多的例子。
里克·詹姆斯

Answers:


3

由于问题不够清晰,我提出了四种不同的解决方案。解决方案的不同之处在于:

  1. 根据Chris的回答,是否要“层叠”
  2. 当您有一个关闭日期时,是使用该组的最早日期还是关闭记录的开始日期。

请注意,此操作是在SQL Server而非MySQL中完成的。除了一些非常小的语法更改外,它应该可以正常工作。

四种方法的通用设置和样品数据

CREATE TABLE #example 
(
    id int NOT NULL DEFAULT '0',
    borrower_id int NOT NULL,
    started datetime NULL DEFAULT NULL,
    closed datetime NULL DEFAULT NULL,
    dead datetime NULL DEFAULT '0000-00-00 00:00:00'
);

CREATE TABLE #result 
(   
    borrower_id int NOT NULL DEFAULT '0',    
    started datetime NULL DEFAULT NULL,    
    ended datetime NULL DEFAULT NULL 
);    

INSERT INTO #example 
    (id, borrower_id, started, closed, dead) 
VALUES 
    (7714,238846,'2015-01-27 15:14:50','2015-02-02 14:14:13',NULL), 
    (7882,238846,'2015-01-28 13:25:58',NULL,'2015-05-15 12:16:07'), 
    (13190,259140,'2015-03-17 10:11:44',NULL,'2015-03-18 07:31:57'), 
    (13192,259140,'2015-03-17 10:12:17',NULL,'2015-03-18 11:46:46'), 
    (13194,259140,'2015-03-17 10:12:53',NULL,'2015-03-18 11:46:36'), 
    (14020,259140,'2015-03-23 14:32:16','2015-03-24 15:57:32',NULL), 
    (17124,242650,'2015-04-16 16:19:08','2015-04-16 16:21:06',NULL), 
    (19690,238846,'2015-05-15 13:17:31',NULL,'2015-05-27 13:56:43'), 
    (20038,242650,'2015-05-19 15:38:17',NULL,NULL), 
    (20040,242650,'2015-05-19 15:39:58',NULL,'2015-05-21 12:01:02'), 
    (20302,242650,'2015-05-21 13:09:06',NULL,NULL), 
    (20304,242650,'2015-05-21 13:09:54',NULL,NULL), 
    (20306,242650,'2015-05-21 13:10:19',NULL,NULL), 
    (20308,242650,'2015-05-21 13:12:20',NULL,NULL), 
    (21202,238846,'2015-05-29 16:47:29',NULL,NULL), 
    (21204,238846,'2015-05-29 16:47:56',NULL,NULL), 
    (21208,238846,'2015-05-29 17:05:15',NULL,NULL), 
    (21210,238846,'2015-05-29 17:05:55',NULL,NULL), 
    (21918,242650,'2015-06-04 17:04:29',NULL,'2015-06-12 15:47:23'); 

1.级联-使用封闭记录解决方案

我相信这是解决方案,寻找者正在寻找与之匹配的结果。

select *
into #temp1
from #example

while (select count(1) from #temp1)>0
begin
    --Grab only one user's records and place into a temp table to work with
    declare @curUser int
    set @curUser=(select min(borrower_id) from #temp1)

    select * 
    into #temp2
    from #temp1 t1
    where t1.borrower_id=@curUser

    while(select count(1) from #temp2)>0
    begin
        --Grab earliest start date and use as basis for 15 day window (#2 rule)
        --Use the record as basis for rules 3 and 4
        declare @minTime datetime
        set @minTime=(select min(started) from #temp2)

        declare @maxTime datetime
        set @maxTime=@minTime

        declare @curId int
        set @curId=(select min(id) from #temp2 where started=@minTime)

        select * 
        into #temp3
        from #temp2 t2
        where t2.id=@curId

        --Remove earliest record from pool of potential records to check rules against
        delete 
        from #temp2 
        where id=@curId

        --Insert all records within 15 days of start date, then remove record from pool
        while (select count(1) 
                from #temp2 t2 
                where t2.started<=DATEADD(day,15,@maxTime) 
                    or t2.closed<=DATEADD(day,15,@maxTime) 
                    or t2.dead<=DATEADD(day,15,@maxTime)  )>0
        begin
            insert into #temp3
            select *
            from #temp2 t2
            where t2.started<=DATEADD(day,15,@maxTime)  or t2.closed<=DATEADD(day,15,@maxTime)  or t2.dead<=DATEADD(day,15,@maxTime) 

            delete
            from #temp2
            where started<=DATEADD(day,15,@maxTime)  or closed<=DATEADD(day,15,@maxTime)  or dead<=DATEADD(day,15,@maxTime) 

            --set new max time from any column
            if (select max(started) from #temp3)>@maxTime
                set @maxTime=(select max(started) from #temp3)
            if (select max(closed) from #temp3)>@maxTime
                set @maxTime=(select max(started) from #temp3)
            if (select max(dead) from #temp3)>@maxTime
                set @maxTime=(select max(started) from #temp3)

        end

        --Calculate end time according to rule #3
        declare @end datetime 
        set @end = null
        set @end=(select min(closed) from #temp3)

        if @end is not null
        begin
            set @minTime=(select started from #temp3 where closed=@end)
        end

        if @end is null
        begin
            if(select count(1) from #temp3 where dead is null)=0
            set @end= (select max(dead) from #temp3)
        end

        insert into #result (borrower_id,started,ended)
        values (@curUser,@minTime,@end)

        drop table #temp3
    end

    --Done with the one user, remove him from temp table and iterate thru to the next user
    delete  
    from #temp1 
    where borrower_id=@curUser    

    drop table #temp2

end

drop table #temp1

drop table #example

select * from #result order by started

drop table #result

2.非级联-使用封闭记录解决方案

从可用的第一个截止日期开始计算,然后从最早的开始日期开始计算。

select *
into #temp1
from #example

while (select count(1) from #temp1)>0
begin
    --Grab only one user's records and place into a temp table to work with
    declare @curUser int
    set @curUser=(select min(borrower_id) from #temp1)

    select * 
    into #temp2
    from #temp1 t1
    where t1.borrower_id=@curUser

    while(select count(1) from #temp2)>0
    begin
        --Grab earliest start date and use as basis for 15 day window (#2 rule)
        --Use the record as basis for rules 3 and 4
        declare @minTime datetime
        set @minTime=(select min(started) from #temp2)

        declare @curId int
        set @curId=(select min(id) from #temp2 where started=@minTime)

        select * 
        into #temp3
        from #temp2 t2
        where t2.id=@curId

        --Remove earliest record from pool of potential records to check rules against
        delete 
        from #temp2 
        where id=@curId

        --Insert all records within 15 days of start date, then remove record from pool
        insert into #temp3
        select *
        from #temp2 t2
        where t2.started<=DATEADD(day,15,@minTime)

        delete
        from #temp2
        where started<=DATEADD(day,15,@minTime)

        --Insert all records within 15 days of closed, then remove record from pool
        insert into #temp3
        select *
        from #temp2 t2
        where t2.closed<=DATEADD(day,15,@minTime)

        delete
        from #temp2
        where closed<=DATEADD(day,15,@minTime)

        --Insert all records within 15 days of dead, then remove record from pool
        insert into #temp3
        select *
        from #temp2 t2
        where t2.dead<=DATEADD(day,15,@minTime)

        delete
        from #temp2
        where dead<=DATEADD(day,15,@minTime)

        --Calculate end time according to rule #3
        declare @end datetime 
        set @end = null
        set @end=(select min(closed) from #temp3)

        if @end is not null
        begin
            set @minTime=(select started from #temp3 where closed=@end)
        end

        if @end is null
        begin
            if(select count(1) from #temp3 where dead is null)=0
            set @end= (select max(dead) from #temp3)
        end

        insert into #result (borrower_id,started,ended)
        values (@curUser,@minTime,@end)

        drop table #temp3
    end

    --Done with the one user, remove him from temp table and iterate thru to the next user
    delete  
    from #temp1 
    where borrower_id=@curUser


    drop table #temp2

end

drop table #temp1

drop table #example

select * from #result

drop table #result

3.非层叠-使用最早的日期解决方案

仅按最早日期开始计算。

select *
into #temp1
from #example

while (select count(1) from #temp1)>0
begin
    --Grab only one user's records and place into a temp table to work with
    declare @curUser int
    set @curUser=(select min(borrower_id) from #temp1)

    select * 
    into #temp2
    from #temp1 t1
    where t1.borrower_id=@curUser

    while(select count(1) from #temp2)>0
    begin
        --Grab earliest start date and use as basis for 15 day window (#2 rule)
        --Use the record as basis for rules 3 and 4
        declare @minTime datetime
        set @minTime=(select min(started) from #temp2)

        declare @curId int
        set @curId=(select min(id) from #temp2 where started=@minTime)

        select * 
        into #temp3
        from #temp2 t2
        where t2.id=@curId

        --Remove earliest record from pool of potential records to check rules against
        delete 
        from #temp2 
        where id=@curId

        --Insert all records within 15 days of start date, then remove record from pool
        insert into #temp3
        select *
        from #temp2 t2
        where t2.started<=DATEADD(day,15,@minTime) or t2.closed<=DATEADD(day,15,@minTime) or t2.dead<=DATEADD(day,15,@minTime)

        delete
        from #temp2
        where started<=DATEADD(day,15,@minTime) or closed<=DATEADD(day,15,@minTime) or dead<=DATEADD(day,15,@minTime)

        --Calculate end time according to rule #3
        declare @end datetime 
        set @end = null

        set @end=(select min(closed) from #temp3)

        if @end is null
        begin
            if(select count(1) from #temp3 where dead is null)=0
            set @end= (select max(dead) from #temp3)
        end

        insert into #result (borrower_id,started,ended)
        values (@curUser,@minTime,@end)

        drop table #temp3
    end

    --Done with the one user, remove him from temp table and itterate thru to the next user
    delete  
    from #temp1 
    where borrower_id=@curUser    

    drop table #temp2

end

drop table #temp1

drop table #example

select * from #result

drop table #result

4.级联-使用最早的日期解决方案

仅按最早日期开始计算。

select *
into #temp1
from #example

while (select count(1) from #temp1)>0
begin
--Grab only one user's records and place into a temp table to work with
declare @curUser int
set @curUser=(select min(borrower_id) from #temp1)

select * 
into #temp2
from #temp1 t1
where t1.borrower_id=@curUser

while(select count(1) from #temp2)>0
begin
    --Grab earliest start date and use as basis for 15 day window (#2 rule)
    --Use the record as basis for rules 3 and 4
        declare @minTime datetime
    set @minTime=(select min(started) from #temp2)


    declare @maxTime datetime
    set @maxTime=@minTime

    declare @curId int
    set @curId=(select min(id) from #temp2 where started=@minTime)

    select * 
    into #temp3
    from #temp2 t2
    where t2.id=@curId

    --Remove earliest record from pool of potential records to check rules against
    delete 
    from #temp2 
    where id=@curId

    --Insert all records within 15 days of start date, then remove record from pool
    while (select count(1) 
            from #temp2 t2 
            where t2.started<=DATEADD(day,15,@maxTime) 
                or t2.closed<=DATEADD(day,15,@maxTime) 
                or t2.dead<=DATEADD(day,15,@maxTime)  )>0
    begin
        insert into #temp3
        select *
        from #temp2 t2
        where t2.started<=DATEADD(day,15,@maxTime)  or t2.closed<=DATEADD(day,15,@maxTime)  or t2.dead<=DATEADD(day,15,@maxTime) 

        delete
        from #temp2
        where started<=DATEADD(day,15,@maxTime)  or closed<=DATEADD(day,15,@maxTime)  or dead<=DATEADD(day,15,@maxTime) 

        --set new max time from any column
        if (select max(started) from #temp3)>@maxTime
            set @maxTime=(select max(started) from #temp3)
        if (select max(closed) from #temp3)>@maxTime
            set @maxTime=(select max(started) from #temp3)
        if (select max(dead) from #temp3)>@maxTime
            set @maxTime=(select max(started) from #temp3)

    end

    --Calculate end time according to rule #3
    declare @end datetime 
    set @end = null

    set @end=(select min(closed) from #temp3)

    if @end is null
    begin
        if(select count(1) from #temp3 where dead is null)=0
        set @end= (select max(dead) from #temp3)
    end

    insert into #result (borrower_id,started,ended)
    values (@curUser,@minTime,@end)

    drop table #temp3
end

--Done with the one user, remove him from temp table and iterate thru to the next user
delete  
from #temp1 
where borrower_id=@curUser

drop table #temp2

end

drop table #temp1

drop table #example

select * from #result order by started

drop table #result

-2

我担心我们可能不清楚如何定义组。我之所以这样说是因为,根据某些未声明的条件,上述日期将形成一个巨大的单一组,或者由三个组组成,其中一组在该组中占主导地位。

缺少分组条件?

1)这15天规则会级联吗?如果某条记录Y在另一条记录之后10天开始X,然后又有另一条记录Z在此之后10天开始,那么这是否构成一组三个记录X,Y,Z,或两个组,每个包含两个记录X,YY,Z?我假设15天规则会级联形成更大的组。

2)日期是否包含在内?例如,如果一条记录的开始日期为几个月,而几个月后为结束日期,那么该范围内的所有日期是否都合并到该组中?我在下面的快速分析中将两种可能性都考虑在内。

潜在分组

因此,如果我们以id开头,则会7714看到开始日期为1/27。显然,下一个7882从1/28开始的条目属于该组。但是请注意,7882结束于5/15,因此必须将在5/15的15天内开始的所有内容添加到该组中。

因此,19690通过21210添加到组中,通过级联导致21918随后将其添加到组中。级联已消耗了集中的几乎所有条目。叫这个GROUP A

但是,如果分组也包括日期在内,则从13190到的所有条目17124也必须属于GROUP A,现在所有ID都在一个组中。

如果从日期GROUP A是不包括,但实际上严格遵守与级联规则后” '15的一天,然后代替你将不得不组成的第二组13190通过14020,并与单一项目的第三组17124

从本质上讲,我的问题是,其中的任何一个是否与您想要的分组匹配,或者在分组定义中是否缺少其他信息?如此冗长的回答很抱歉,但您请求的临时输出似乎未达到您的分组定义。

通过澄清,我相信我们可以解决这个问题。


如果我完全摆脱了15天规则怎么办?这样可以简化问题吗?
Noah Goodrich 2015年

2
另外,我认为您错过了将第一个截止日期优先于最后一个失效日期的机会。结果,对于从1/27开始的第一个分组,结束日期2/2成为该组的结束,而不是5/15。
Noah Goodrich 2015年

Yikes,您是对的,我确实误解了您对第一个封闭/最后死者的说法...对不起,我是在太平洋时间晚上12:30在昨晚进行此操作的,所以我可能有点困。:)另外,我认为按用户数据进行的其他分组可能会有所帮助。我会再考虑一下,然后尝试与您联系。
克里斯(Chris
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.