该MERGE
语句具有复杂的语法和甚至更复杂的实现,但是从本质上讲,该想法是联接两个表,筛选出需要更改(插入,更新或删除)的行,然后执行请求的更改。给定以下样本数据:
DECLARE @CategoryItem AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL,
PRIMARY KEY (CategoryId, ItemId),
UNIQUE (ItemId, CategoryId)
);
DECLARE @DataSource AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL
PRIMARY KEY (CategoryId, ItemId)
);
INSERT @CategoryItem
(CategoryId, ItemId)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(3, 5),
(3, 6),
(4, 5);
INSERT @DataSource
(CategoryId, ItemId)
VALUES
(2, 2);
目标
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 2 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 3 ║
║ 3 ║ 5 ║
║ 4 ║ 5 ║
║ 3 ║ 6 ║
╚════════════╩════════╝
资源
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝
理想的结果是用源中的数据替换目标中的数据,但仅用于CategoryId = 2
。按照MERGE
上面给出的描述,我们应该编写一个仅在键上连接源和目标的查询,并仅在WHEN
子句中过滤行:
MERGE INTO @CategoryItem AS TARGET
USING @DataSource AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY SOURCE
AND TARGET.CategoryId = 2
THEN DELETE
WHEN NOT MATCHED BY TARGET
AND SOURCE.CategoryId = 2
THEN INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
得到以下结果:
╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 2 ║
║ 3 ║ 5 ║
║ 3 ║ 6 ║
║ 4 ║ 5 ║
╚════════════╩════════╝
执行计划是:
请注意,两个表都已完全扫描。我们可能认为这种效率低下,因为CategoryId = 2
目标表中只有行会受到影响。这是联机丛书中的警告出现的地方。一种旨在优化以仅接触目标中必要行的误导尝试是:
MERGE INTO @CategoryItem AS TARGET
USING
(
SELECT CategoryId, ItemId
FROM @DataSource AS ds
WHERE CategoryId = 2
) AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND TARGET.CategoryId = 2
WHEN NOT MATCHED BY TARGET THEN
INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
ON
子句中的逻辑被用作连接的一部分。在这种情况下,该联接是完全外部联接(有关原因,请参阅此联机丛书条目)。作为外部联接的一部分,在目标行上应用类别2的检查最终会导致删除具有不同值的行(因为它们与源不匹配):
╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 1 ║ 1 ║
║ DELETE ║ 1 ║ 2 ║
║ DELETE ║ 1 ║ 3 ║
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
║ DELETE ║ 3 ║ 5 ║
║ DELETE ║ 3 ║ 6 ║
║ DELETE ║ 4 ║ 5 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝
根本原因与外部连接ON
子句中谓词的行为与该WHERE
子句中指定的谓词行为不同的原因相同。该MERGE
语法(并根据规定的条款中加入执行)只是使它很难看到,这是如此。
在线丛书中的指南(在“ 优化性能”条目中进行了扩展)提供了一些指南,这些指南将确保使用MERGE
语法来表达正确的语义,而用户不必了解所有实现细节,也不必考虑优化器合法重新安排的方式。事情出于执行效率的原因。
该文档提供了三种可能的方法来实施早期筛选:
在WHEN
子句中指定过滤条件可确保得到正确的结果,但可能意味着从源表和目标表读取和处理的行超出了严格必要的数量(如第一个示例所示)。
通过包含过滤条件的视图进行更新还可以保证结果正确(因为必须通过视图更新才能访问更改的行),但这确实需要专用视图,并且该视图遵循奇特的条件来更新视图。
使用公用表表达式与将谓词添加到ON
子句有相似的风险,但是原因略有不同。在许多情况下,这是安全的,但需要对执行计划进行专家分析以确认这一点(以及广泛的实际测试)。例如:
WITH TARGET AS
(
SELECT *
FROM @CategoryItem
WHERE CategoryId = 2
)
MERGE INTO TARGET
USING
(
SELECT CategoryId, ItemId
FROM @DataSource
WHERE CategoryId = 2
) AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY TARGET THEN
INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
这样可以产生最佳方案的正确结果(不再重复):
该计划仅从目标表中读取类别2的行。如果目标表很大,这可能是重要的性能考虑因素,但是使用MERGE
语法很难弄错这个问题。
有时,将MERGE
单独编写为DML操作更容易。这种方法甚至比单个方法的性能更好MERGE
,这一事实常常使人们感到惊讶。
DELETE ci
FROM @CategoryItem AS ci
WHERE ci.CategoryId = 2
AND NOT EXISTS
(
SELECT 1
FROM @DataSource AS ds
WHERE
ds.ItemId = ci.ItemId
AND ds.CategoryId = ci.CategoryId
);
INSERT @CategoryItem
SELECT
ds.CategoryId,
ds.ItemId
FROM @DataSource AS ds
WHERE
ds.CategoryId = 2;