SQL Server索引与统计

之间有什么区别CREATE INDEX和CREATE STATISTICS何时该使用的呢？

sql-server index statistics

— 史考特
source

索引存储实际数据（数据页或索引页，取决于我们正在讨论的索引类型），而统计信息存储数据分布。因此，CREATE INDEX将是DDL以创建索引（聚集，非聚集等），并且CREATE STATISTICS是DDL以创建表中各列的统计信息。

我建议您阅读有关关系数据的这些方面。以下是一些初学者的入门文章。这些是非常广泛的主题，因此，有关它们的信息可能会非常广泛和深入。在下面阅读它们的一般概念，并在出现它们时提出更具体的问题。

有关表和索引组织的
 BOL参考有关聚集索引结构的
 BOL参考有关非聚集索引结构的 BOL参考有关索引
 简介的SQL Server Central
有关统计的BOL参考

这是一个有效的示例，可以看到这两个部分的实际作用（建议加以解释）：

use testdb;
go

create table MyTable1
(
    id int identity(1, 1) not null,
    my_int_col int not null
);
go

insert into MyTable1(my_int_col)
values(1);
go 100

-- this statement will create a clustered index
-- on MyTable1.  The index key is the id field
-- but due to the nature of a clustered index
-- it will contain all of the table data
create clustered index MyTable1_CI
on MyTable1(id);
go


-- by default, SQL Server will create a statistics
-- on this index.  Here is proof.  We see a stat created
-- with the name of the index, and the consisting stat 
-- column of the index key column
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- here is a standalone statistics on a single column
create statistics MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- now look at the statistics that exist on the table.
-- we have the additional statistics that's not necessarily
-- corresponding to an index
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- what is a stat look like?  run DBCC SHOW_STATISTICS
-- to get a better idea of what is stored
dbcc show_statistics('MyTable1', 'MyTable1_CI');
go

这是统计数据的测试样本如下所示：

在此处输入图片说明

注意，统计信息是数据分布的约束。它们帮助SQL Server确定最佳计划。这方面的一个很好的例子是，想象一下您将重获生命。如果您知道有多少重量是因为上面有一个重量标记，那么您将确定最佳的举重方式和肌肉。这就是SQL Server对统计信息所做的工作。

-- create a nonclustered index
-- with the key column as my_int_col
create index IX_MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- let's look at this index
select
    object_name(object_id) as object_name,
    name as index_name,
    index_id,
    type_desc,
    is_unique,
    fill_factor
from sys.indexes
where name = 'IX_MyTable1_MyIntCol';

-- now let's see some physical aspects
-- of this particular index
-- (I retrieved index_id from the above query)
select *
from sys.dm_db_index_physical_stats
(
    db_id('TestDB'),
    object_id('MyTable1'),
    4,
    null,
    'detailed'
);

从上面的示例中我们可以看到，索引实际上包含数据（取决于索引的类型，叶页将有所不同）。

这篇文章仅展示了SQL Server 的这两个主要方面的非常非常简要的概述。两者都可能占用章节和书籍。阅读一些参考资料，然后您将更好地掌握。

— 托马斯·斯金格
source

我知道这是一篇过时的文章，但值得注意的是，创建索引将（在大多数情况下）自动为该索引生成统计信息。创建统计信息不能说相同的话。

— Steve Mangiameli 2014年