如何在MySQL中为具有NULL值的列设计索引?


11

我有一个拥有4000万个条目的数据库,并希望使用以下WHERE子句运行查询

...
WHERE
  `POP1` IS NOT NULL 
  && `VT`='ABC'
  && (`SOURCE`='HOME')
  && (`alt` RLIKE '^[AaCcGgTt]$')
  && (`ref` RLIKE '^[AaCcGgTt]$')
  && (`AA` RLIKE '^[AaCcGgTt]$')
  && (`ref` = `AA` || `alt` = `AA`)
LIMIT 10 ;

POP1是一个浮点列,也可以为NULL。POP1 IS NOT NULL应该排除大约50%的条目,这就是为什么我将其放在开头。所有其他术语仅会稍微减少该数字。

除其他外,我设计了一个index pop1_vt_source,它似乎没有使用,而使用了索引vt为第一列的索引。解释输出:

| id | select_type | table | type | possible_keys                          | key                 | key_len | ref         | rows     | Extra       |
|  1 | SIMPLE      | myTab | ref  | vt_source_pop1_pop2,pop1_vt_source,... | vt_source_pop1_pop2 | 206     | const,const | 20040021 | Using where |

为什么pop1不使用第一列的索引?因为NOT或由于NULL一般。如何改善索引和WHERE子句的设计?即使限制为10个条目,查询也要花费30秒以上的时间,尽管表中的前100个条目应包含10个匹配项。

Answers:


10

它是NOT NULL

CREATE TEMPORARY TABLE `myTab` (`notnul` FLOAT, `nul` FLOAT);
INSERT INTO `myTab` VALUES (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2);
SELECT * FROM `myTab`;

给出:

+--------+------+
| notnul | nul  |
+--------+------+
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
+--------+------+

创建索引:

CREATE INDEX `notnul_nul` ON `myTab` (`notnul`, `nul`);
CREATE INDEX `nul_notnul` ON `myTab` (`nul`, `notnul`);

SHOW INDEX FROM `myTab`;

给出:

+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| myTab |          1 | notnul_nul |            1 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | notnul_nul |            2 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            1 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            2 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

现在解释选择。似乎MySQL使用了索引,即使您使用了NOT NULL

EXPLAIN SELECT * FROM `myTab` WHERE `notnul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
|  1 | SIMPLE      | myTab | index | notnul_nul    | notnul_nul | 10      | NULL |   12 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+


EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | nul_notnul    | nul_notnul | 5       | NULL |    6 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+

但是,当与NOT NULLand 比较时NULL,似乎MySQL在使用时更喜欢其他索引NOT NULL。尽管这显然不会添加任何信息。这是因为MySQL NOT NULL在类型列中将其解释为范围。我不确定是否有解决方法:

EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NULL && notnul=2;
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys         | key        | key_len | ref         | rows | Extra                    |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | myTab | ref  | notnul_nul,nul_notnul | notnul_nul | 10      | const,const |    1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+


EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL && notnul=2;
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys         | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | notnul_nul,nul_notnul | notnul_nul | 10      | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+

我认为在MySQL中可能会有更好的实现,因为这NULL是一个特殊的价值。可能大多数人都对NOT NULL价值观感兴趣。


3

问题不是NULL值。它是指标的选择性。在您的示例中,的选择性source, pop1优于的选择性pop1。它涵盖了该where子句中的更多条件,因此更有可能减少页面点击。

您可能会认为将行数减少50%就足够了,但事实并非如此。where子句中索引的好处是减少了正在读取的页面数。如果一个页面平均至少有一个具有非NULL值的记录,则使用索引没有任何好处。而且,如果每页有10条记录,那么几乎每页都会有其中一条记录。

您可以尝试在上建立索引(pop1, vt, source)。优化器应该选择一个。

但最后,如果该where子句不断丢失记录-没有规则,但是假设20%-那么索引可能无济于事。一种例外是索引包含查询所需的所有列。这样就可以满足查询,而不必为每个记录引入数据页。

而且,如果使用索引并且选择性很高,那么使用索引的性能可能会比不使用索引的性能差。


我认为确实是造成差异的范围(请参阅我的答案)。尽管我认为可以在MySQL中更好地实现它,但是由于大多数人都对NOT NULL专栏感兴趣。
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.