选择索引列顺序时,首要的考虑是:
我的查询中是否有针对此列的(相等)谓词?
如果列从未出现在where子句中,则不值得索引(1)
好的,因此您有一个表,并针对每一列进行查询。有时不止一个。
您如何确定要编制索引的内容?
让我们来看一个例子。这是一个包含三列的表格。一个保存10个值,另一个保存1000个,最后10,000个:
create table t(
few_vals varchar2(10),
many_vals varchar2(10),
lots_vals varchar2(10)
);
insert into t
with rws as (
select lpad(mod(rownum, 10), 10, '0'),
lpad(mod(rownum, 1000), 10, '0'),
lpad(rownum, 10, '0')
from dual connect by level <= 10000
)
select * from rws;
commit;
select count(distinct few_vals),
count(distinct many_vals) ,
count(distinct lots_vals)
from t;
COUNT(DISTINCTFEW_VALS) COUNT(DISTINCTMANY_VALS) COUNT(DISTINCTLOTS_VALS)
10 1,000 10,000
这些数字用零填充。这将有助于以后进行压缩。
因此,您有三个常见查询:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
你索引什么?
仅针对little_vals的索引仅比全表扫描略胜一筹:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 61 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 2 | VIEW | VW_DAG_0 | 1 | 1000 | 1000 |00:00:00.01 | 61 |
| 3 | HASH GROUP BY | | 1 | 1000 | 1000 |00:00:00.01 | 61 |
| 4 | TABLE ACCESS FULL| T | 1 | 1000 | 1000 |00:00:00.01 | 61 |
-------------------------------------------------------------------------------------------
select /*+ index (t (few_vals)) */
count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 58 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 58 |
| 2 | VIEW | VW_DAG_0 | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 3 | HASH GROUP BY | | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 58 |
| 5 | INDEX RANGE SCAN | FEW | 1 | 1000 | 1000 |00:00:00.01 | 5 |
-------------------------------------------------------------------------------------------------------------
因此,单独索引不太可能。lot_vals上的查询返回几行(在这种情况下,仅返回1)。因此,这绝对值得索引。
但是,针对这两列的查询又如何呢?
你应该索引:
( few_vals, lots_vals )
要么
( lots_vals, few_vals )
技巧问题!
答案都不是。
当然little_vals是一个长字符串。因此,您可以获得很好的压缩效果。而且,您(可能)使用仅对lot_vals谓词的(few_vals,lots_vals)进行查询的索引跳过扫描。但我不在这里,尽管它的性能明显优于完整扫描:
create index few_lots on t(few_vals, lots_vals);
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 61 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 61 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 61 |
| 4 | TABLE ACCESS FULL| T | 1 | 1 | 1 |00:00:00.01 | 61 |
-------------------------------------------------------------------------------------------
select /*+ index_ss (t few_lots) */count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 13 | 11 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 13 | 11 |
| 5 | INDEX SKIP SCAN | FEW_LOTS | 1 | 40 | 1 |00:00:00.01 | 12 | 11 |
----------------------------------------------------------------------------------------------------------------------
你喜欢赌博吗?(2)
因此,您仍然需要一个以lots_vals作为前导列的索引。至少在这种情况下,复合指数(少量,手数)所做的工作量与单手(很多)上的工作量相同
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 3 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 3 |
| 5 | INDEX RANGE SCAN | FEW_LOTS | 1 | 1 | 1 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------------------------------
create index lots on t(lots_vals);
select /*+ index (t (lots_vals)) */count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where lots_vals = '0000000001'
and few_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 3 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
| 5 | INDEX RANGE SCAN | LOTS | 1 | 1 | 1 |00:00:00.01 | 2 | 1 |
----------------------------------------------------------------------------------------------------------------------
在某些情况下,复合索引可以为您节省1-2个IO。但是值得为此节省两个索引吗?
而且复合索引还有另一个问题。比较三个索引(包括LOTS_VALS)的聚类因子:
create index lots on t(lots_vals);
create index lots_few on t(lots_vals, few_vals);
create index few_lots on t(few_vals, lots_vals);
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where table_name = 'T';
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_LOTS 47 10,000 530
LOTS_FEW 47 10,000 53
LOTS 31 10,000 53
FEW 31 10 530
请注意,leather_lots的聚类因子比lots和lots_few 高10倍!这是在演示表中,该表首先具有完美的群集。在现实世界的数据库中,效果可能会更糟。
那有什么不好的呢?
聚类因子是确定索引“吸引力”的关键驱动因素之一。它越高,优化器选择它的可能性就越小。特别是在lots_vals实际上并非唯一的情况下,但通常每个值仍然只有几行。如果您不走运,这可能足以使优化程序认为完整扫描更便宜...
好的,因此具有little_vals和lots_vals的复合索引仅具有边缘案例的优势。
查询过滤little_vals和many_vals呢?
单列索引只会带来很小的好处。但是结合起来,它们返回的值很少。因此,综合索引是一个好主意。但是,哪条路呢?
如果您不放在首位,则压缩前导列会使该列变小
create index few_many on t(many_vals, few_vals);
create index many_few on t(few_vals, many_vals);
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_MANY 47 1,000 10,000
MANY_FEW 47 1,000 10,000
alter index few_many rebuild compress 1;
alter index many_few rebuild compress 1;
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
MANY_FEW 31 1,000 10,000
FEW_MANY 34 1,000 10,000
越少,前导列中的不同值将压缩得越好。因此,读取此索引的工作量略有减少。但是只有一点点。而且两者都已经比原来的小了很多(大小减小了25%)。
您可以进一步压缩整个索引!
alter index few_many rebuild compress 2;
alter index many_few rebuild compress 2;
select index_name, leaf_blocks, distinct_keys, clustering_factor
from user_indexes
where index_name in ('FEW_MANY', 'MANY_FEW');
INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
FEW_MANY 20 1,000 10,000
MANY_FEW 20 1,000 10,000
现在,两个索引都恢复到相同的大小。请注意,这利用了一个事实,即很少有很多关系。同样,您不太可能在现实世界中看到这种好处。
到目前为止,我们仅讨论了平等检查。通常,使用复合索引时,其中一列会不等式。例如,诸如“在过去N天内为客户获取订单/装运/发票”之类的查询。
如果您有这些类型的查询,则需要与索引的第一列相等:
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals < '0000000002'
and many_vals = '0000000001';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 12 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 12 |
| 2 | VIEW | VW_DAG_0 | 1 | 10 | 10 |00:00:00.01 | 12 |
| 3 | HASH GROUP BY | | 1 | 10 | 10 |00:00:00.01 | 12 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 10 | 10 |00:00:00.01 | 12 |
| 5 | INDEX RANGE SCAN | FEW_MANY | 1 | 10 | 10 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------------------------------
select count (distinct few_vals || ':' || many_vals || ':' || lots_vals )
from t
where few_vals = '0000000001'
and many_vals < '0000000002';
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST -PREDICATE'));
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 12 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 12 | 1 |
| 2 | VIEW | VW_DAG_0 | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 3 | HASH GROUP BY | | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2 | 10 |00:00:00.01 | 12 | 1 |
| 5 | INDEX RANGE SCAN | MANY_FEW | 1 | 1 | 10 |00:00:00.01 | 2 | 1 |
----------------------------------------------------------------------------------------------------------------------
注意,他们使用相反的索引。
TL; DR
- 具有相等条件的列应在索引中排在第一位。
- 如果查询中有多个相等的列,则将具有最小不同值的列放在最前面将提供最佳的压缩优势
- 尽管可以进行索引跳过扫描,但您需要确信,这在可预见的将来仍将是可行的选择
- 包括近唯一列的复合索引所带来的好处最小。确保您确实需要保存1-2个IO
1:在某些情况下,如果这意味着查询中的所有列都在索引中,则可能值得在索引中包括一列。这将启用仅索引扫描,因此您无需访问表。
2:如果您获得了诊断和调优的许可,则可以使用SQL计划管理将计划强制跳过扫描
阿德尼达
PS-您引用的文档来自9i。那太老了。我会坚持最近的事情