0%

MySQL中的松散索引和紧凑索引

MySQL Group By中的松散索引和紧凑索引

最近看官方文档中发现,有两个索引扫描方式:松散索引和紧凑索引,以前也是看到过,但是一直没有去详细的了解,这次看了下,发现官方文档中说的也不是很清楚,就自己总结下对这两种索引扫描方式的理解

松散索引(Loose Index Scan)

在官方文档中提到,当有Group By子句时,在特定的情况下可以使用松散索引,减少扫描的次数。官方文档中给出了以下的一些情况可以使用到松散索引:

1
2
3
4
5
6
7
SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;

根据官方文档的描述,也就是当语句能满足索引”最左原则”时就可以使用到松散索引。创建个表用于测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE TABLE `t1` (
`c1` int(11) DEFAULT NULL,
`c2` int(11) DEFAULT NULL,
`c3` int(11) NOT NULL,
`c4` int(11) DEFAULT NULL,
KEY `idx` (`c1`,`c2`,`c3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
表中数据如下:
root@localhost:mysql-8.0-3319.sock 10:30:32 [dhy]>select * from t1;
+------+------+----+------+
| c1 | c2 | c3 | c4 |
+------+------+----+------+
| 1 | 2 | 1 | 3 |
| 1 | 2 | 2 | 3 |
| 1 | 2 | 3 | 3 |
+------+------+----+------+
3 rows in set (0.01 sec)

以”SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;”这个语句为例,查看其执行计划:

1
2
3
4
5
6
7
root@localhost:mysql-8.0-3319.sock  10:32:37 [dhy]>explain SELECT c1, c2 FROM t1 WHERE c3 = 1 GROUP BY c1, c2;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | idx | idx | 14 | NULL | 2 | 100.00 | Using where; Using index for group-by |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

执行计划中Extra这列中出现了 Using index for group-by,代表使用了松散索引,但为何这个语句能用到松散索引,松散索引是到底是什么呢?
个人对松散的理解,就像是跳跃索引,”SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;”这个语句group by c1,c2遵循索引的”最左原则”c1,c2这两列是有序排列的,则可以先对前两列distinct,再过滤c3这个字段,这样减少了前两列的扫描次数。

紧凑索引(Tight Index Scan)

紧凑索引就像是进行了索引全表扫描,以下这两个语句不能使用松散索引,但是可以使用紧凑索引扫描:

1
2
SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;

先看下执行计划:

1
2
3
4
5
6
7
root@localhost:mysql-8.0-3319.sock  11:00:42 [dhy]>explain SELECT c1, c2, c3 FROM t1 WHERE c2 = 2  GROUP BY c1, c3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | index | idx | idx | 14 | NULL | 3 | 33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

这里type是index,也就是索引全表扫描,并且Extra里没有Using index for group-by,为何这里不能利用松散索引呢?因为group by字段是c1 和 c3两个字段,中间出现了GAP,所以无法按”SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;”这个SQL一样先distinct c1 ,c2这两个字段,因为c1,c3两个字段组合起来不一定是排序的,只能逐条扫描然后再过滤c2字段的值。例如数据是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
+------+------+----+------+
| c1 | c2 | c3 | c4 |
+------+------+----+------+
| 1 | 1 | 3 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 2 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 3 |
| 2 | 2 | 1 | 3 |
| 2 | 3 | 3 | 3 |
+------+------+----+------+

如果按c1和c3两个字段先group by则数据会是这样的:

1
2
3
4
5
6
7
8
c1 c3 c2
1 3 1
1 1 2
1 2 2
1 1 3
2 2 1
2 1 2
2 3 3

显然c1 和c3 这两列的数据不是排序好的(1,1,2),(1,2,2),(1,1,3),所以如果要做distinct操作则需要进行一次排序操作,或者根据c1这一列开始做索引的全表扫描,这样就保证了数据是有序的。

总结

松散索引其实就是遵循最左原则,这样数据都是有序的,可以不进行排序直接distinct,然后过滤where条件的字段。而紧凑索引数据是无序的,需要进行索引全扫描,会造成多扫描一些数据。