这是对算法的一种尝试。它不是完美的,并且取决于您要花费多少时间对其进行优化,可能还会有一些更小的收获。
假设您有一个要由四个队列执行的任务表。您知道与执行每个任务相关的工作量,并且希望所有四个队列获得几乎相等的工作量,因此所有队列将在大约同一时间完成。
首先,我将按照任务的大小从小到大按顺序对任务进行分区。
SELECT [time], ROW_NUMBER() OVER (ORDER BY [time])%4 AS grp, 0
的ROW_NUMBER()
订单的每一行由大小,然后分配的行编号,从1开始该行数被分配一个“基团”(在grp
循环的基础上栏)。第一行是组1,第二行是组2,然后是3,第四行是组0,依此类推。
time ROW_NUMBER() grp
---- ------------ ---
1 1 1
10 2 2
12 3 3
15 4 0
19 5 1
22 6 2
...
为了便于使用,我将time
和grp
列存储在名为的表变量中@work
。
现在,我们可以对该数据执行一些计算:
WITH cte AS (
SELECT *, SUM([time]) OVER (PARTITION BY grp)
-SUM([time]) OVER (PARTITION BY (SELECT NULL))/4 AS _grpoffset
FROM @work)
...
该列_grpoffset
是time
每个总数grp
与“理想”平均值相差多少。如果time
所有任务的总数为1000,并且有四个组,则理想情况下,每个组中应总共有250个。如果一个组总共包含268个,则该组的为_grpoffset=18
。
这个想法是确定两个最佳行,一个在“积极”组中(工作量太大),另一个在“消极”组中(工作量太少)。如果我们可以在这两行上交换组,则可以减少_grpoffset
两个组的绝对值。
例:
time grp total _grpoffset
---- --- ----- ----------
3 1 222 40
46 1 222 40
73 1 222 40
100 1 222 40
6 2 134 -48
52 2 134 -48
76 2 134 -48
11 3 163 -21
66 3 163 -21
86 3 163 -21
45 0 208 24
71 0 208 24
92 0 208 24
----
=727
满分为727,每组的平均得分应为182,这是完美的分配。该组的分数与182之间的区别是我们在_grpoffset
栏中所输入的。
如您现在所见,在最好的情况下,我们应该将大约40点的行从组1移动到组2,将大约24点的行从组3移动到组0。
这是识别那些候选行的代码:
SELECT TOP 1 pos._row AS _pos_row, pos.grp AS _pos_grp,
neg._row AS _neg_row, neg.grp AS _neg_grp
FROM cte AS pos
INNER JOIN cte AS neg ON
pos._grpoffset>0 AND
neg._grpoffset<0 AND
--- To prevent infinite recursion:
pos.moved<4 AND
neg.moved<4
WHERE --- must improve positive side's offset:
ABS(pos._grpoffset-pos.[time]+neg.[time])<=pos._grpoffset AND
--- must improve negative side's offset:
ABS(neg._grpoffset-neg.[time]+pos.[time])<=ABS(neg._grpoffset)
--- Largest changes first:
ORDER BY ABS(pos.[time]-neg.[time]) DESC
) AS x ON w._row IN (x._pos_row, x._neg_row);
我正在自我加入我们之前创建的公用表表达式cte
:一方面,使用正数的组,另一方面,_grpoffset
使用负数的组。为了进一步过滤出应该匹配的行,必须改进正负极行的交换_grpoffset
,即使其接近于0。
该TOP 1
和ORDER BY
选择“最好”的比赛进行到第掉。
现在,我们UPDATE
要做的就是添加一个,然后对其进行循环,直到找不到更多的优化为止。
TL; DR-这是查询
这是完整的代码:
DECLARE @work TABLE (
_row int IDENTITY(1, 1) NOT NULL,
[time] int NOT NULL,
grp int NOT NULL,
moved tinyint NOT NULL,
PRIMARY KEY CLUSTERED ([time], _row)
);
WITH cte AS (
SELECT 0 AS n, CAST(1+100*RAND(CHECKSUM(NEWID())) AS int) AS [time]
UNION ALL
SELECT n+1, CAST(1+100*RAND(CHECKSUM(NEWID())) AS int) AS [time]
FROM cte WHERE n<100)
INSERT INTO @work ([time], grp, moved)
SELECT [time], ROW_NUMBER() OVER (ORDER BY [time])%4 AS grp, 0
FROM cte;
WHILE (@@ROWCOUNT!=0)
WITH cte AS (
SELECT *, SUM([time]) OVER (PARTITION BY grp)
-SUM([time]) OVER (PARTITION BY (SELECT NULL))/4 AS _grpoffset
FROM @work)
UPDATE w
SET w.grp=(CASE w._row
WHEN x._pos_row THEN x._neg_grp
ELSE x._pos_grp END),
w.moved=w.moved+1
FROM @work AS w
INNER JOIN (
SELECT TOP 1 pos._row AS _pos_row, pos.grp AS _pos_grp,
neg._row AS _neg_row, neg.grp AS _neg_grp
FROM cte AS pos
INNER JOIN cte AS neg ON
pos._grpoffset>0 AND
neg._grpoffset<0 AND
--- To prevent infinite recursion:
pos.moved<4 AND
neg.moved<4
WHERE --- must improve positive side's offset:
ABS(pos._grpoffset-pos.[time]+neg.[time])<=pos._grpoffset AND
--- must improve negative side's offset:
ABS(neg._grpoffset-neg.[time]+pos.[time])<=ABS(neg._grpoffset)
--- Largest changes first:
ORDER BY ABS(pos.[time]-neg.[time]) DESC
) AS x ON w._row IN (x._pos_row, x._neg_row);