我有一个查询,需要根据标量UDF的结果进行过滤。该查询必须作为单个语句发送(因此我不能将UDF结果分配给局部变量),并且不能使用TVF。我知道由标量UDF引起的性能问题,其中包括强制整个计划以串行方式运行,过多的内存授予,基数估计问题以及缺少内联。对于这个问题,请假设我需要使用标量UDF。
UDF本身的调用成本非常高,但是从理论上讲,查询可以由优化器以合理的方式实现,使得函数只需要计算一次即可。我为这个问题模拟了一个大大简化的例子。以下查询需要6152毫秒才能在我的计算机上执行:
SELECT x1.ID
FROM dbo.X_100_INTEGERS x1
WHERE x1.ID >= dbo.EXPENSIVE_UDF();
查询计划中的过滤器运算符建议针对每行对该函数进行一次评估:
DDL和数据准备:
CREATE OR ALTER FUNCTION dbo.EXPENSIVE_UDF () RETURNS INT
AS
BEGIN
DECLARE @tbl TABLE (VAL VARCHAR(5));
-- make the function expensive to call
INSERT INTO @tbl
SELECT [VALUE]
FROM STRING_SPLIT(REPLICATE(CAST('Z ' AS VARCHAR(MAX)), 20000), ' ');
RETURN 1;
END;
GO
DROP TABLE IF EXISTS dbo.X_100_INTEGERS;
CREATE TABLE dbo.X_100_INTEGERS (ID INT NOT NULL);
-- insert 100 integers from 1 - 100
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.X_100_INTEGERS WITH (TABLOCK)
SELECT n FROM Nums WHERE n <= 100;
这是上面示例的db fiddle链接,尽管在此处执行代码大约需要18秒。
在某些情况下,由于供应商提供的功能,我可能无法编辑该功能的代码。在其他情况下,我可以进行更改。如何强制在一个查询中仅对一次标量UDF求值?