答案部分
有多种方法可以使用不同的T-SQL构造来重写它。我们将看一下优缺点,并在下面进行整体比较。
首先:使用OR
SELECT COUNT(*)
FROM dbo.Users AS u
WHERE u.Age < 18
OR u.Age IS NULL;
使用OR
为我们提供了更有效的Seek计划,该计划可以读取所需的确切行数,但是它将技术世界称为a whole mess of malarkey
查询计划的内容。
还要注意,Seek在这里执行了两次,从图形运算符中应该更明显:
Table 'Users'. Scan count 2, logical reads 8233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 469 ms, elapsed time = 473 ms.
第二篇:在UNION ALL
我们的查询中使用派生表也可以这样重写
SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records);
这样就产生了相同类型的计划,而少了一些恶意键,并且对于寻找(搜索到)该索引多少次的诚实度有了更明显的认识。
它执行与查询相同的读取次数(8233)OR
,但是节省了大约100毫秒的CPU时间。
CPU time = 313 ms, elapsed time = 315 ms.
但是,您在这里必须非常小心,因为如果该计划尝试并行执行,那么两个单独的COUNT
操作将被序列化,因为它们每个都被视为全局标量集合。如果我们使用跟踪标志8649强制执行并行计划,问题将变得很明显。
SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);
可以通过稍微更改查询来避免这种情况。
SELECT SUM(Records)
FROM
(
SELECT 1
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT 1
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);
现在,执行Seek的两个节点都完全并行化,直到我们命中串联运算符。
就其价值而言,完全并行的版本具有一些不错的好处。以大约100次以上的读取和大约90ms的额外CPU时间为代价,经过的时间减少到93ms。
Table 'Users'. Scan count 12, logical reads 8317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 500 ms, elapsed time = 93 ms.
那CROSS APPLY呢?
没有魔力,没有答案是完整的CROSS APPLY
!
不幸的是,我们遇到了更多的问题COUNT
。
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT COUNT(Id)
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age IS NULL
) x (Records);
这个计划太可怕了。这是您最后一次出现在圣帕特里克节时最终要制定的计划。尽管很好地并行,但由于某种原因,它正在扫描PK / CX。真是的 该计划的成本为2198美元。
Table 'Users'. Scan count 7, logical reads 31676233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 29532 ms, elapsed time = 5828 ms.
这是一个奇怪的选择,因为如果我们强迫它使用非聚集索引,则成本将大大下降至1798查询美元。
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT COUNT(Id)
FROM dbo.Users AS u2 WITH (INDEX(ix_Id_Age))
WHERE u2.Id = u.Id
AND u2.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u2 WITH (INDEX(ix_Id_Age))
WHERE u2.Id = u.Id
AND u2.Age IS NULL
) x (Records);
嘿,寻求!在那儿检查你。还要注意,有了的魔力CROSS APPLY
,我们不需要费劲就能拥有几乎完全并行的计划。
Table 'Users'. Scan count 5277838, logical reads 31685303, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 27625 ms, elapsed time = 4909 ms.
如果没有任何COUNT
东西,交叉申请的结果会更好。
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT 1
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age < 18
UNION ALL
SELECT 1
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age IS NULL
) x (Records);
该计划看起来不错,但是读取和CPU并没有改善。
Table 'Users'. Scan count 20, logical reads 17564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4844 ms, elapsed time = 863 ms.
将交叉重写为派生联接所产生的结果完全相同。我不会重新发布查询计划和统计信息-它们确实没有改变。
SELECT COUNT(u.Id)
FROM dbo.Users AS u
JOIN
(
SELECT u.Id
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT u.Id
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x ON x.Id = u.Id;
关系代数:要彻底讲解,并防止Joe Celko困扰我的梦想,我们至少需要尝试一些奇怪的关系事物。没什么了!
尝试 INTERSECT
SELECT COUNT(*)
FROM dbo.Users AS u
WHERE NOT EXISTS ( SELECT u.Age WHERE u.Age >= 18
INTERSECT
SELECT u.Age WHERE u.Age IS NOT NULL );
Table 'Users'. Scan count 1, logical reads 9157, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1094 ms, elapsed time = 1090 ms.
这是尝试 EXCEPT
SELECT COUNT(*)
FROM dbo.Users AS u
WHERE NOT EXISTS ( SELECT u.Age WHERE u.Age >= 18
EXCEPT
SELECT u.Age WHERE u.Age IS NULL);
Table 'Users'. Scan count 7, logical reads 9247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2126 ms, elapsed time = 376 ms.
可能还有其他写这些方法的方法,但是我会让那些可能会使用EXCEPT
并且INTERSECT
比我更常使用的人留给他们。
如果您真的只需要计数,
我会COUNT
在查询中使用它作为速记(请阅读:我很懒,有时无法提出更多涉及的场景)。如果只需要计数,则可以使用CASE
表达式执行几乎相同的操作。
SELECT SUM(CASE WHEN u.Age < 18 THEN 1
WHEN u.Age IS NULL THEN 1
ELSE 0 END)
FROM dbo.Users AS u
SELECT SUM(CASE WHEN u.Age < 18 OR u.Age IS NULL THEN 1
ELSE 0 END)
FROM dbo.Users AS u
这些都具有相同的计划,并具有相同的CPU和读取特性。
Table 'Users'. Scan count 1, logical reads 9157, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 719 ms, elapsed time = 719 ms.
赢家?
在我的测试中,在派生表上使用SUM进行强制并行计划的效果最佳。是的,可以通过添加几个过滤索引来说明这两个谓词来辅助其中的许多查询,但是我想对其他谓词进行一些试验。
SELECT SUM(Records)
FROM
(
SELECT 1
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT 1
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);
谢谢!
NOT EXISTS ( INTERSECT / EXCEPT )
查询可以不包含以下INTERSECT / EXCEPT
部分而工作:WHERE NOT EXISTS ( SELECT u.Age WHERE u.Age >= 18 );
另一种方式-使用EXCEPT
:(SELECT COUNT(*) FROM (SELECT UserID FROM dbo.Users EXCEPT SELECT UserID FROM dbo.Users WHERE u.Age >= 18) AS u ;
其中UserID是PK或任何唯一的非null列)。