SQL Server有一个叫做“多列统计”的东西,但这不是人们想的那样。
让我们看下面的示例表:
CREATE TABLE BadStatistics
(
IsArchived BIT NOT NULL,
Id INT NOT NULL IDENTITY PRIMARY KEY,
Mystery VARCHAR(200) NOT NULL
);
CREATE NONCLUSTERED INDEX BadIndex
ON BadStatistics (IsArchived, Mystery);
这样,将在我们拥有的两个索引上创建两个统计信息:
BadIndex的统计信息:
+--------------+----------------+-------------------------+
| All density | Average Length | Columns |
+--------------+----------------+-------------------------+
| 0.5 | 1 | IsArchived |
+--------------+----------------+-------------------------+
| 4.149378E-06 | 37 | IsArchived, Mystery |
+--------------+----------------+-------------------------+
| 4.149378E-06 | 41 | IsArchived, Mystery, Id |
+--------------+----------------+-------------------------+
+--------------+------------+---------+---------------------+----------------+
| RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
+--------------+------------+---------+---------------------+----------------+
| 0 | 0 | 24398 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
| 1 | 0 | 216602 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
聚集索引的统计信息:
+--------------+----------------+---------+
| All density | Average Length | Columns |
+--------------+----------------+---------+
| 4.149378E-06 | 4 | Id |
+--------------+----------------+---------+
+--------------+------------+---------+---------------------+----------------+
| RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
+--------------+------------+---------+---------------------+----------------+
| 1 | 0 | 1 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
| 240999 | 240997 | 1 | 240997 | 1 |
+--------------+------------+---------+---------------------+----------------+
| 241000 | 0 | 1 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
(我在表中填充了随机样本数据,其中大约十分之一的行未归档。此后,我运行了一次完整的扫描统计信息更新。)
为什么两栏统计的直方图只使用一列?我知道,很多人都写关于它做,但有什么道理呢?在这种情况下,由于第一列只有两个值,因此它使整个直方图的使用效率大大降低。为什么会像这样任意限制统计数据?
请注意,这个问题不是针对多维直方图,它是完全不同的野兽。它是关于一维直方图,其中一维是包含各自的多个列的元组。