如何在SQL Server的“统计信息”中确定直方图步数?
为什么即使我的键列有200多个不同的值,它也限制为200步?有决定因素吗?
演示版
模式定义
CREATE TABLE histogram_step
(
id INT IDENTITY(1, 1),
name VARCHAR(50),
CONSTRAINT pk_histogram_step PRIMARY KEY (id)
)
在表中插入100条记录
INSERT INTO histogram_step
(name)
SELECT TOP 100 name
FROM sys.syscolumns
更新和检查统计信息
UPDATE STATISTICS histogram_step WITH fullscan
DBCC show_statistics('histogram_step', pk_histogram_step)
直方图步骤:
+--------------+------------+---------+---------------------+----------------+
| RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
+--------------+------------+---------+---------------------+----------------+
| 1 | 0 | 1 | 0 | 1 |
| 3 | 1 | 1 | 1 | 1 |
| 5 | 1 | 1 | 1 | 1 |
| 7 | 1 | 1 | 1 | 1 |
| 9 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 |
| 13 | 1 | 1 | 1 | 1 |
| 15 | 1 | 1 | 1 | 1 |
| 17 | 1 | 1 | 1 | 1 |
| 19 | 1 | 1 | 1 | 1 |
| 21 | 1 | 1 | 1 | 1 |
| 23 | 1 | 1 | 1 | 1 |
| 25 | 1 | 1 | 1 | 1 |
| 27 | 1 | 1 | 1 | 1 |
| 29 | 1 | 1 | 1 | 1 |
| 31 | 1 | 1 | 1 | 1 |
| 33 | 1 | 1 | 1 | 1 |
| 35 | 1 | 1 | 1 | 1 |
| 37 | 1 | 1 | 1 | 1 |
| 39 | 1 | 1 | 1 | 1 |
| 41 | 1 | 1 | 1 | 1 |
| 43 | 1 | 1 | 1 | 1 |
| 45 | 1 | 1 | 1 | 1 |
| 47 | 1 | 1 | 1 | 1 |
| 49 | 1 | 1 | 1 | 1 |
| 51 | 1 | 1 | 1 | 1 |
| 53 | 1 | 1 | 1 | 1 |
| 55 | 1 | 1 | 1 | 1 |
| 57 | 1 | 1 | 1 | 1 |
| 59 | 1 | 1 | 1 | 1 |
| 61 | 1 | 1 | 1 | 1 |
| 63 | 1 | 1 | 1 | 1 |
| 65 | 1 | 1 | 1 | 1 |
| 67 | 1 | 1 | 1 | 1 |
| 69 | 1 | 1 | 1 | 1 |
| 71 | 1 | 1 | 1 | 1 |
| 73 | 1 | 1 | 1 | 1 |
| 75 | 1 | 1 | 1 | 1 |
| 77 | 1 | 1 | 1 | 1 |
| 79 | 1 | 1 | 1 | 1 |
| 81 | 1 | 1 | 1 | 1 |
| 83 | 1 | 1 | 1 | 1 |
| 85 | 1 | 1 | 1 | 1 |
| 87 | 1 | 1 | 1 | 1 |
| 89 | 1 | 1 | 1 | 1 |
| 91 | 1 | 1 | 1 | 1 |
| 93 | 1 | 1 | 1 | 1 |
| 95 | 1 | 1 | 1 | 1 |
| 97 | 1 | 1 | 1 | 1 |
| 99 | 1 | 1 | 1 | 1 |
| 100 | 0 | 1 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
如我们所见,直方图中有53个步骤。
再次插入几千条记录
INSERT INTO histogram_step
(name)
SELECT TOP 10000 b.name
FROM sys.syscolumns a
CROSS JOIN sys.syscolumns b
更新和检查统计信息
UPDATE STATISTICS histogram_step WITH fullscan
DBCC show_statistics('histogram_step', pk_histogram_step)
现在直方图的步骤减少到4个步骤
+--------------+------------+---------+---------------------+----------------+
| RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
+--------------+------------+---------+---------------------+----------------+
| 1 | 0 | 1 | 0 | 1 |
| 10088 | 10086 | 1 | 10086 | 1 |
| 10099 | 10 | 1 | 10 | 1 |
| 10100 | 0 | 1 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
再次插入几千条记录
INSERT INTO histogram_step
(name)
SELECT TOP 100000 b.name
FROM sys.syscolumns a
CROSS JOIN sys.syscolumns b
更新和检查统计信息
UPDATE STATISTICS histogram_step WITH fullscan
DBCC show_statistics('histogram_step', pk_histogram_step)
现在直方图的步骤减少到3个步骤
+--------------+------------+---------+---------------------+----------------+
| RANGE_HI_KEY | RANGE_ROWS | EQ_ROWS | DISTINCT_RANGE_ROWS | AVG_RANGE_ROWS |
+--------------+------------+---------+---------------------+----------------+
| 1 | 0 | 1 | 0 | 1 |
| 110099 | 110097 | 1 | 110097 | 1 |
| 110100 | 0 | 1 | 0 | 1 |
+--------------+------------+---------+---------------------+----------------+
有人可以告诉我如何确定这些步骤吗?