总的来说,我有两种时间间隔:
presence time
和 absence time
absence time
可以具有不同的类型(例如休息,缺席,特殊日子等),并且时间间隔可能重叠和/或相交。
这是不肯定的,只有间隔的合理组合,原始数据存在,例如。重叠的存在间隔没有意义,但可能存在。我现在尝试通过多种方法来确定出现的时间间隔-对我来说,最舒服的似乎是紧随其后的时间间隔。
;with "timestamps"
as
(
select
"id" = row_number() over ( order by "empId", "timestamp", "opening", "type" )
, "empId"
, "timestamp"
, "type"
, "opening"
from
(
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 1 as "type" from "worktime" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 2 as "type" from "break" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 3 as "type" from "absence" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
) as data
)
select
T1."empId"
, "starttime" = T1."timestamp"
, "endtime" = T2."timestamp"
from
"timestamps" as T1
left join "timestamps" as T2
on T2."empId" = T1."empId"
and T2."id" = T1."id" + 1
left join "timestamps" as RS
on RS."empId" = T2."empId"
and RS."id" <= T1."id"
group by
T1."empId", T1."timestamp", T2."timestamp"
having
(sum( power( 2, RS."type" ) * RS."opening" ) = 2)
order by
T1."empId", T1."timestamp";
有关一些演示数据,请参见SQL-Fiddle。
原始数据以"starttime" - "endtime"
或形式存在于不同的表中"starttime" - "duration"
。
想法是获得每个时间戳的有序列表,并在每个时间使用打开间隔的“位掩码”滚动总和来估计存在时间。
即使不同时间间隔的星际相等,小提琴也会起作用并给出估计的结果。在此示例中不使用索引。
这是完成质疑任务的正确方法,还是有更优雅的方法呢?
如果与回答相关:每位员工每张表格的数据量最多为一万个数据集。sql-2012无法用于总计计算内联的前辈的滚动总和。
编辑:
只需对大量测试数据(1000、10.000、100.000、100万)执行查询,就可以看到运行时间呈指数增长。显然是警告标志,对吗?
我更改了查询,并通过新奇的更新删除了滚动汇总。
我添加了一个辅助表:
create table timestamps
(
"id" int
, "empId" int
, "timestamp" datetime
, "type" int
, "opening" int
, "rolSum" int
)
create nonclustered index "idx" on "timestamps" ( "rolSum" ) include ( "id", "empId", "timestamp" )
我将计算滚动总和移到了这个地方:
declare @rolSum int = 0
update "timestamps" set @rolSum = "rolSum" = @rolSum + power( 2, "type" ) * "opening" from "timestamps"
关于“工作时间”表中的100万个条目,运行时间减少到3秒。
问题保持不变:解决此问题的最有效方法是什么?
[this]
。我想我比双引号更好。