非常抱歉,非常详细的问题。我已包含查询以生成用于重现该问题的完整数据集,并且我在32核计算机上运行SQL Server 2012。但是,我不认为这是特定于SQL Server 2012的,对于此特定示例,我已将MAXDOP强制设置为10。
我有两个使用相同分区方案进行分区的表。当在用于分区的列上将它们连接在一起时,我注意到SQL Server无法像人们期望的那样优化并行合并连接,因此选择使用HASH JOIN。在这种特殊情况下,我可以通过基于分区函数将查询分为10个不相交的范围并在SSMS中同时运行每个查询,来手动模拟一个更优化的并行MERGE JOIN。使用WAITFOR精确地同时运行它们,结果是所有查询在原始并行HASH JOIN使用的总时间的约40%内完成。
对于等效分区的表,是否有任何方法可以使SQL Server自行进行此优化?我了解到,SQL Server通常会为了使MERGE JOIN并行而产生大量开销,但是在这种情况下,似乎有一种非常自然的分片方法,开销很小。也许仅仅是一个特殊的情况,优化器还不够聪明以至于无法识别?
下面是设置简化数据集以重现此问题的SQL:
/* Create the first test data table */
CREATE TABLE test_transaction_properties
( transactionID INT NOT NULL IDENTITY(1,1)
, prop1 INT NULL
, prop2 FLOAT NULL
)
/* Populate table with pseudo-random data (the specific data doesn't matter too much for this example) */
;WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
, E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b)
, E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
, E8(N) AS (SELECT 1 FROM E4 a CROSS JOIN E4 b)
INSERT INTO test_transaction_properties WITH (TABLOCK) (prop1, prop2)
SELECT TOP 10000000 (ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) % 5) + 1 AS prop1
, ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) * rand() AS prop2
FROM E8
/* Create the second test data table */
CREATE TABLE test_transaction_item_detail
( transactionID INT NOT NULL
, productID INT NOT NULL
, sales FLOAT NULL
, units INT NULL
)
/* Populate the second table such that each transaction has one or more items
(again, the specific data doesn't matter too much for this example) */
INSERT INTO test_transaction_item_detail WITH (TABLOCK) (transactionID, productID, sales, units)
SELECT t.transactionID, p.productID, 100 AS sales, 1 AS units
FROM test_transaction_properties t
JOIN (
SELECT 1 as productRank, 1 as productId
UNION ALL SELECT 2 as productRank, 12 as productId
UNION ALL SELECT 3 as productRank, 123 as productId
UNION ALL SELECT 4 as productRank, 1234 as productId
UNION ALL SELECT 5 as productRank, 12345 as productId
) p
ON p.productRank <= t.prop1
/* Divides the transactions evenly into 10 partitions */
CREATE PARTITION FUNCTION [pf_test_transactionId] (INT)
AS RANGE RIGHT
FOR VALUES
(1,1000001,2000001,3000001,4000001,5000001,6000001,7000001,8000001,9000001)
CREATE PARTITION SCHEME [ps_test_transactionId]
AS PARTITION [pf_test_transactionId]
ALL TO ( [PRIMARY] )
/* Apply the same partition scheme to both test data tables */
ALTER TABLE test_transaction_properties
ADD CONSTRAINT PK_test_transaction_properties
PRIMARY KEY (transactionID)
ON ps_test_transactionId (transactionID)
ALTER TABLE test_transaction_item_detail
ADD CONSTRAINT PK_test_transaction_item_detail
PRIMARY KEY (transactionID, productID)
ON ps_test_transactionId (transactionID)
现在,我们终于可以重现次优查询了!
/* This query produces a HASH JOIN using 20 threads without the MAXDOP hint,
and the same behavior holds in that case.
For simplicity here, I have limited it to 10 threads. */
SELECT COUNT(*)
FROM test_transaction_item_detail i
JOIN test_transaction_properties t
ON t.transactionID = i.transactionID
OPTION (MAXDOP 10)
但是,使用单个线程来处理每个分区(下面第一个分区的示例)将导致更有效的计划。我通过在同一时刻针对10个分区中的每个分区运行以下查询来测试了这一点,所有10个分区都在1秒钟内完成:
SELECT COUNT(*)
FROM test_transaction_item_detail i
INNER MERGE JOIN test_transaction_properties t
ON t.transactionID = i.transactionID
WHERE t.transactionID BETWEEN 1 AND 1000000
OPTION (MAXDOP 1)