0%

插入意向锁死锁案例

现象

客户环境中遇到一个死锁现象, 客户不能理解在这种情况下为何会发生死锁。通常死锁发生的情况如下 :
当两个事务都试图获取另一个事务已经拥有的锁时,就会发生死锁

事务1在记录A上获得锁定,事务2在记录B上获得锁定 。随后每个事务尝试获取另一事务持有的锁将触发死锁的锁定。

但这个案例中死锁的产生和上述所讲情况有些不一样,我们来模拟下:

  • 环境信息

    • 事务隔离级别 :RR
    • MySQL版本 : 8.0.13
  • 表结构:

1
2
CREATE TABLE t (a INT UNSIGNED NOT NULL PRIMARY KEY, b INT);
INSERT INTO t VALUES(10,0),(20,0);
  • 复现情况如下:
session1 session2
BEGIN;
UPDATE t SET b=1 WHERE a=20;
//执行成功
mysql> begin;mysql> SELECT * FROM t LOCK IN SHARE MODE;
//发生阻塞
INSERT INTO t VALUES(11,1); //同一时刻session2报死锁错误
SELECT * FROM t LOCK IN SHARE MODE;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

现象如上述表格中所示Session1中执行了UPDATE语句,随后Session2执行了一个全表查询并且带上了IN SHARE MODE添加了共享锁,之后Session1再次执行INSERT语句的同时Session2直接报了死锁事务被回滚了。

分析

这个现象中疑惑的地方有几点:

  1. 两个事务之间是如何加锁的?
  2. 为何就产生了死锁?
  3. 发生死锁后为什么时session2被回滚了?

让我们来逐一分析下

两个事务之间是如何加锁的?

让我们来逐一分析下,两个事务之间是如何加锁的?当Session1执行完UPDATE后,加锁情况
:

这里看到是对表上添加了IX锁同时对记录20上添加了X锁,Session2执行后加锁情况:

Session2执行后总共会申请三个锁:

  1. 表上添加IS锁
  2. a=10 这条记录上添加GAP S-lock锁
  3. a=20 这条记录上添加GAP S-lock锁

GAP S-lock 本质是 Next Key Lock (S), 在第20章中我们介绍过LOCK_MODE各种显式结果对应的锁类型。这里的GAP S-lock代表的就是GAP锁+S记录锁的组合等于Next Key Lock(S)
Session2执行会发生阻塞是因为a=20这条记录上的已经被Session1持有了X锁与将要申请的S锁冲突了,这里需要注意Session2上锁的类型应是GAP S-lock锁,这条语句上锁的范围是(-∞,10],(10,20],(20,+∞)。 这里由于申请a=20记录上S锁时发生了阻塞,我们看不到“supremum pseudo-record”如果我们单独执行这条语句加锁情况:

之后Session1执行的INSERT语句将会产生插入意向锁(Insert intention lock):

为何就产生了死锁?

我们看下gap锁与插入意向锁的兼容情况:

gap X-lock gap S-lock Insert intention lock
Gap X-lock Incompatible Incompatible
Gap S-lock Incompatible Compatible
Insert intention Incompatible Incompatible

根据以上表格我们再来分析这个锁问题:

session1 session2
a=20 -> lock(x)
记录锁X
a=10 -> lock(s)
a=20 -> lock wait session1
GAP S-lock
插入意向锁
session2等待session1上X锁的释放,随后的插入意向锁与session2 GAP S-lock不兼容, 这样就会造成session1与session2都不能同时进行下去了造成了死锁。

发生死锁后为什么时session2被回滚了?

InooDB在发现死锁时选择回滚占用资源最小的事务,通过innodb_trx表中的trx_weight来判断占用资源的大小,此案例中单独去执行SQL通过查询innodb_trx表分别对应的trx_weight如表所示,由于Session2的SELECT语句对应的trx_weigt小于Session1的,所以Session2被回滚:

语句 trx_weight
update 3
select 2
insert 5
所以这里回滚session2

总结及扩展

这个案例中我们分析了死锁的产生过程,重点的地方是插入意向锁与GAP锁兼不兼容。同时也知道了发生回滚时InnoDB如何选择的

MGR基于Paxos协议,多个节点之间强同步的高可用方案,但多个节点的强一致性是保证了最终一致性,在平常的各节点复制中还是异步的。

MGR中有两种需要一致性读的地方

  • 手工或自动发生切换的时候
    当主库发生故障时MGR会选举一个新的主节点,新主节点接管后默认是不等待事务补偿到和原主节点相差的这部分数据,就对外提供服务,这时客户端再次发送新的请求时,就有可能与前一次查询到的结果不一致(MySQL 8.0.14之前都采用的是这种方式)

  • 数据复制
    由于各节点之间复制是异步的,所以会存在两个客户端分别读取不同节点上的数据库时得到的结果不同。

我们很容易想到改成同步的不就好了吗?,但改为同步方式也有两种情况:

  • 同步写
    每一次写操作都会等待数据同步到各个节点,等待各个节点上都commit(applied)成功。
    这种方式适用于以下场景:
  1. 需要读的强一致性
  2. 读多写少
  • 同步读
    这种方式不会等待每个节点都commit(applied)成功,而是每次读操作时会等待数据应用完成,以便读取到的数据是最新的
    这种方式适用于以下场景:
  1. 需要强一致性读
  2. 写多读少

MGR中的一致性读参数group_replication_consistency

在MySQL8.0.14中增加的group_replication_consistency参数,用来设置一致性读的级别,参数可设置为以下几个值:

  • EVENTUAL(默认值):

8.0.14之前的方式,所有的读写操作不会等待数据在各个节点commit(applied)完成

  • BEFORE_ON_PRIMARY_FAILOVER
    发生切换之后将等待积压的事务应用完成之后,才允许读写操作
  • BEFORE
    RW事务操作等待之前所有的事务操作全部完成,同时RO事务操作等待之前的事务全部应用完成,保证读的数据是最新正确的。对应我们上面所说的同步读。
  • AFTER
    RW事务操作会等待其对数据的变更,在所有节点上全部commit(applied)完成,保证之后读的数据是最新正确的。对应我们上面所说的同步写。
  • BEFORE_AND_AFTER
    BEFORE与AFTER的组合

BEFORE与AFTER的区别之处:

  1. BEFORE不会等待数据的变更在所有节点上全部commit(applied)完成,可以理解为写操作还是异步的,对于写多读少的场景,这样可以较少同步RW事务的开销
  2. AFTER写操作在各个节点上是同步的,对于读多写少的场景,可以较少RO事务同步等待的开销。

group_replication_consistency的作用域

此参数在单个节点上可设置为session或者是global,可以根据不同的情况设置session级别,部分数据的同步,这点非常灵活。但是需要注意AFTER与BEFORE_AND_AFTER会影响到集群中的所有节点。同时AFTER与BEFORE_AND_AFTER会影响到所有online的成员,举个例子如下:

1
2
3
4
5
6
7
8
9
# Group with 3 members: M1, M2 and M3
# on M1 a client executes:
> SET @@SESSION.group_replication_consistency= AFTER;
> BEGIN;
> INSERT INTO t1 VALUES (1); # T1
> COMMIT;
# concurrently, while T1 is being committed, on M2 a client executes:
> SET SESSION group_replication_consistency= EVENTUAL;
> SELECT * FROM t1; # T2

即使T2一致性级别为EVENTUAL,如果它在T1已经处于M2的提交阶段时开始执行,则T2等待T1完成提交,然后才能执行

等待时间由什么控制

如果集群压力非常大或者出现什么异常情况则可能会一直等待下去,那么就要有个超时时间,这个超时时间由参数wait_timeout控制(默认是8个小时),如果超过这么长时间则会抛出ER_GR_HOLD_WAIT_TIMEOUT异常

参考资料

https://mysqlhighavailability.com/group-replication-consistency-levels/
https://mysqlhighavailability.com/group-replication-consistent-reads/

现象描述

客户在夜间批量执行数据处理时发生了死锁现象,是由不同的会话并发删除数据引起的,这个问题原因是比较简单,但想通过这个案例让大家熟悉如何去排查死锁问题,如何去阅读死锁日志这才是目的。通过模拟用户死锁现象后,死锁日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
*** (1) TRANSACTION:
TRANSACTION 39474, ACTIVE 58 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 9, OS thread handle 123145525800960, query id 77 localhost root updating
DELETE FROM t1 WHERE id = 4
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39474 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471399; asc G ;;
3: len 2; hex 6464; asc dd;;

*** (2) TRANSACTION:
TRANSACTION 39475, ACTIVE 46 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 10, OS thread handle 123145526104064, query id 78 localhost root updating
DELETE FROM t1 WHERE id = 3
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471399; asc G ;;
3: len 2; hex 6464; asc dd;;

Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000005; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471375; asc G u;;
3: len 2; hex 6565; asc ee;;

Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000006; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471351; asc G Q;;
3: len 2; hex 6666; asc ff;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000009a32; asc 2;;
2: len 7; hex 01000001462e1f; asc F. ;;
3: len 2; hex 6363; asc cc;;

*** WE ROLL BACK TRANSACTION (2)

如何阅读死锁日志

要排查死锁问题我们就要学会如何查看死锁日志,但MySQL死锁日志看起来并不是很直观需要我们一步一步耐心分析。
我们将上面的死锁日志拆分阅读,我们可以得出以下信息:

  • 两个事务的事务ID
    TRANSACTION 39474
    TRANSACTION 39475

  • 事务39474在执行delete语句是发生了锁等待

    1
    2
    3
    4
    5
    6
    7
    8
    DELETE FROM t1 WHERE id = 4
    *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39474 lock_mode X locks rec but not gap waiting
    Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
    0: len 4; hex 00000004; asc ;; //聚集索引的值
    1: len 6; hex 000000009a33; asc 3;; //事务ID
    2: len 7; hex 02000001471399; asc G ;; //undo 记录
    3: len 2; hex 6464; asc dd;; //非主键字段的值

    通过以上信息可以得出事务39474执行delete语句时,锁等待发生在申请ID=4这条记录上的X锁“lock_mode X locks rec but not gap waiting”

  • 事务39475持有锁的信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    *** (2) HOLDS THE LOCK(S):
    RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap
    Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
    0: len 4; hex 00000004; asc ;;
    1: len 6; hex 000000009a33; asc 3;;
    2: len 7; hex 02000001471399; asc G ;;
    3: len 2; hex 6464; asc dd;;

    Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
    0: len 4; hex 00000005; asc ;;
    1: len 6; hex 000000009a33; asc 3;;
    2: len 7; hex 02000001471375; asc G u;;
    3: len 2; hex 6565; asc ee;;

    Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
    0: len 4; hex 00000006; asc ;;
    1: len 6; hex 000000009a33; asc 3;;
    2: len 7; hex 02000001471351; asc G Q;;
    3: len 2; hex 6666; asc ff;;

    事务39475持有在ID=4,5,6上X锁

  • 事务39475同样在执行delete语句时发生了所等待

1
2
3
4
5
6
7
8
9
10
11
12
13
14
*** (2) TRANSACTION:
TRANSACTION 39475, ACTIVE 46 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 10, OS thread handle 123145526104064, query id 78 localhost root updating
DELETE FROM t1 WHERE id = 3

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000009a32; asc 2;;
2: len 7; hex 01000001462e1f; asc F. ;;
3: len 2; hex 6363; asc cc;;

申请ID=3上的X锁时发生了所等待,执行的语句是:DELETE FROM t1 WHERE id = 3,那么可以得出39474在id=3上持有了X锁,但是在死锁日志中并没有显示出事务39474持有锁的信息
那么这两个事务加锁的顺序应是:
1. 事务39474持有了id=3上的X锁
2. 事务39475持有了id=4上的X锁
3. 事务39474申请id=4上X锁时发生了锁等待执行的语句是:DELETE FROM t1 WHERE id = 4
4. 事务39475申请id=3上X锁时触发了死锁,因为此时双方都在申请对方持有的锁不能进行下去了。

  • 事务2被回滚

    1
    *** WE ROLL BACK TRANSACTION (2)
  • 事务39475持有ID = 4, 5, 6上的X锁是由哪个语句引起的,无法直观从死锁日志里看出。可以通过打开general日志或者binlog或者业务代码来查看整个事务逻辑

实验步骤及表结构

搭建可按实验步骤自己模拟测试,表结构及数据如下:

1
2
3
CREATE TABLE t1 (id int unsigned NOT NULL PRIMARY KEY, c1
varchar(10));
INSERT INTO t1 VALUES (1, 'aa'), (2, 'bb'), (3, 'cc'), (4, 'dd'), (5, 'ee'), (6, 'ff');

操作步骤如表所示:

Session1 Session2
START TRANSACTION;
DELETE FROM t1 WHERE id = 1;
  START TRANSACTION;
DELETE FROM t1 WHERE id = 6;
DELETE FROM t1 WHERE id = 2;
  DELETE FROM t1 WHERE id = 5;
DELETE FROM t1 WHERE id = 3;
  DELETE FROM t1 WHERE id = 4;
DELETE FROM t1 WHERE id = 4;
  DELETE FROM t1 WHERE id = 3;
//发生死锁

总结

这个案例根本原因是两个会话同时删除数据时,没有控制好删除的顺序造成了死锁,这就需要我们在做应用开发时对数据库操作一定要注意操作数据的前后关系、是否有数据依赖、会话之间是否会操作相同的数据。
通过这个案例我们也了解到了应如何去阅读和分析死锁日志。

PostgreSQL WAL日志名解析

日志名组成

在PG中日志名是一串数字,刚开始接触PG的朋友对名字都有些疑惑,在PG中日志名是由16进制命名总共24个字符由三部分组成:

1
2
3
4
0000000100000001000000C4
00000001 //时间线ID
00000001 //LogId
000000C4 //logSeg

如何计算?

我们知道由三部分组成,那么又是如何计算呢?公式如下:

1
WAL segment file name = timelineId +(uint32)LSN−1 / (16M ∗ 256) + (uint32)(LSN − 1 / 16M) % 256

我们算一个试一试.查看当前LSN位置

1
2
3
4
5
postgres=# select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
1/C469AA30
(1 row)

这里的LSN是’ 1/C469AA30’ 我们转换为十进制数:

1
2
3
4
5
postgres=# select x'1C469AA30'::bigint;
int8
------------
7590226480
(1 row)

利用公式计算:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
logSeg:
postgres=# select ((7590226480 - 1) / (16 * 1024 * 1024 )) % 256 ;
?column?
----------
196
(1 row)
196是十进制数 转换为16 进制为 c4
postgres=# select to_hex(196);
to_hex
--------
c4
(1 row)

LogId:
postgres=# select ((7590226480 - 1) / (16::bigint * 1024::bigint * 1024::bigint * 256::bigint) :: int8);
?column?
----------
1
(1 row)
得出的LogId等于1

时间线ID:

postgres@coredumped  ~  pg_controldata|grep TimeLine
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1

算出来的值与通过函数查询的一致:

1
2
3
4
5
postgres=# select pg_walfile_name('1/C469AA30');
pg_walfile_name
--------------------------
0000000100000001000000C4
(1 row)

通过源码分析

上述的公式也是在网站中看到,特意查看了下PG代码确认下正确性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// 这里是计算一个logSegNo
#define XLByteToPrevSeg(xlrp, logSegNo) \
logSegNo = ((xlrp) - 1) / XLogSegSize

//XLOG_SEG_SIZE 定义
#define XLOG_SEG_SIZE (16 * 1024 * 1024)

//这块是拼文件名地方
#define XLogFilePath(path, tli, logSegNo) \
snprintf(path, MAXPGPATH, XLOGDIR "/%08X%08X%08X", tli,
¦(uint32) ((logSegNo) / XLogSegmentsPerXLogId),
¦(uint32) ((logSegNo) % XLogSegmentsPerXLogId))

LogId与logSeg 分别是LogSegNo除XLogSegmentsPerXLogId或者是对XLogSegmentsPerXLogId取模,那么XLogSegmentsPerXLogId又是什么呢?


//看到XLogSegmentsPerXLogId的定义我们可以自己计算下
#define XLogSegmentsPerXLogId (UINT64CONST(0x100000000) / XLOG_SEG_SIZE)

postgres=# select x'100000000'::bigint / (16 * 1024 * 1024);
?column?
----------
256
(1 row)
这个值就是256

总结

WAL日志命名感觉上并不像MySQL Binlog日志那么直观,有时候感觉会容易混乱,大家学习时可以多进行研究多做实验,这样对自己也是一种提高。

PostgreSQL流复制搭建

PG的流复制类似于Oracle中的DG,利用WAL(Redo)日志传输属于物理复制,优点在于延迟较小

搭建步骤

环境信息

主库为已初始化好数据库PG实例,备库为已安装好软件但未初始化的数据库

ip 角色
192.168.56.39 主库
192.168.56.40 备库

主库需配置的参数

  • wal_level 应设置为replica, or logical
  • max_wal_senders 默认是10 , 不要设置为小于2
  • synchronous_standby_names 如果使用同步复制,这里需要设置standby的名字
  • wal_keep_segments 设置主库上wal的保留个数,防止WAL日志被删除但还未传送到备库上
  • synchronous_commit 同步复制选项
    • 设置为on,代表需要等待备库应用完事务日志并且数据刷到磁盘中,主库才可以返回成功
    • 设置为remote_apply ,代表需要等待备库应用完事务日志,主库即可返回成功
    • 设置为remote_write,代表需要等待备库将事务日志写入到磁盘中,主库即可返回成功
    • 设置为local,代表事务日志写入到主库磁盘中,主库即可返回成功 // 在同步复制中不能设置为local
    • pg_hba.conf 文件中需要添加repliaction 的权限
      1
      host    replication     all             0.0.0.0/0               trust
      这里注意,必须要配置replication权限
      1
      host    all             all             0.0.0.0/0               trust
      这里all 是不包括replication权限的
  • hot_standby 设置为ON,这样备库可以供查询使用

    使用pg_basebackup搭建基础备份

    备库上使用pg_basebackup拉取主库数据,作为基础备份,数据库拉取到/pgdata目录下
    1
    2
    [postgres@pg40 ~]$ pg_basebackup -D /pgdata/ -h 192.168.56.39 -R -X s -P
    335994/335994 kB (100%), 1/1 tablespace

编辑recovery.conf文件

我们使用了-R参数,在拉取的备份文件中会自动生成recovery文件

1
2
3
standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'application_name=stb40 user=postgres passfile=''/home/postgres/.pgpass'' host=192.168.56.39 port=5432 sslmode=prefer sslcompression=1 target_session_attrs=any'

启动数据库

1
pg_ctl start -D /pgdata/

监控流复制

在主库上查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 10807
usesysid | 10
usename | postgres
application_name | stb40
client_addr | 192.168.56.40
client_hostname |
client_port | 50660
backend_start | 2018-05-06 23:27:42.625869+08
backend_xmin |
state | streaming
sent_lsn | 2/95000140
write_lsn | 2/95000140
flush_lsn | 2/95000140
replay_lsn | 2/95000140
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async

如何判断实例为主库还是备库

使用pg_controldata查看

1
2
3
4
5
[postgres@pg40 ~]$ pg_controldata |grep cluster
Database cluster state: in archive recovery //备库

[postgres@pg39 log]$ pg_controldata |grep cluster
Database cluster state: in production //主库

SQL优化-子查询结果集过多

这个优化案例是一个实际的业务场景,朋友公司从商业数据库迁移到PG中,测试中发现性能存在问题,同等数据量在商业数据库中不到秒级就出结果,但是在PG中缺需要几分钟甚至更久。

经过优化对SQL的改写效果还是不错,这里与大家分享下。

现象

最大的一张主表T98_SYT_ACCT_STAT_D_1有5.8亿,其余的都是百万级
原始SQL语句如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
select
org.Internal_Org_Id as Stat_Org_Id
,(case when org.Internal_Org_Id = '999999888'
then '合计'
else org.Org_Name
end ) as Stat_Org_Name
,(case when org.Internal_Org_Id = '999999888'
then 1
else 2
end) as ORDERID
,sum(data.jon_acct_num) as jon_acct_num
,sum(data.jon_cross_num) as jon_cross_num
,sum(data.jon_pay_num) as jon_pay_num
,sum(data.total_jon_num) as total_jon_num
,sum(data.des_acct_num) as des_acct_num
,sum(data.des_cross_num) as des_cross_num
,sum(data.des_pay_num) as des_pay_num
,sum(data.total_des_num) as total_des_num
,sum(data.accjon_acct_num) as accjon_acct_num
,sum(data.accjon_cross_num) as accjon_cross_num
,sum(data.accjon_pay_num) as accjon_pay_num
,sum(data.total_accjon_num) as total_accjon_num
,sum(data.accjon_acct_amt) as accjon_acct_amt
,sum(data.accjon_cross_amt) as accjon_cross_amt
,sum(data.accjon_acct1_amt) as accjon_acct1_amt
,sum(data.accjon_acct0_amt) as accjon_acct0_amt
,sum(data.accjon_pay_amt) as accjon_pay_amt
,sum(data.total_accjon_amt) as total_accjon_amt
from cimfbview.t98_int_org_app_rela_h org
left join
(
select
syt.Stat_Org_Id as Stat_Org_Id
,sum(syt.jon_acct_num) as jon_acct_num
,sum(syt.jon_cross_num) as jon_cross_num
,sum(syt.jon_pay_num) as jon_pay_num
,sum(syt.total_jon_num) as total_jon_num
,sum(syt.des_acct_num) as des_acct_num
,sum(syt.des_cross_num) as des_cross_num
,sum(syt.des_pay_num) as des_pay_num
,sum(syt.total_des_num) as total_des_num
,sum(syt.accjon_acct_num) as accjon_acct_num
,sum(syt.accjon_cross_num) as accjon_cross_num
,sum(syt.accjon_pay_num) as accjon_pay_num
,sum(syt.total_accjon_num) as total_accjon_num
,sum(syt.accjon_acct_amt) as accjon_acct_amt
,sum(syt.accjon_cross_amt) as accjon_cross_amt
,sum(syt.accjon_acct1_amt) as accjon_acct1_amt
,sum(syt.accjon_acct0_amt) as accjon_acct0_amt
,sum(syt.accjon_pay_amt) as accjon_pay_amt
,sum(syt.total_accjon_amt) as total_accjon_amt
from T98_SYT_ACCT_STAT_D_1 syt
where syt.Stat_Period_Cd = '1'
and syt.Summ_Date = to_date('20161001','yyyymmdd')
and syt.Stat_Org_Attr_Cd = '9'
group by 1
) data
on org.Internal_Org_Id=data.Stat_Org_Id
where (org.Internal_Org_Id = '999999888'
OR ORG.Parent_Int_Org_Id = '999999888'
or '999999888' =
(
case
when '1' = '1' and '2' in ('2','3') then nation_org_id
when '1' = '2' and '2' in ('3','4','5') then prov_org_id
when '1' = '3' and '2' in ('4','5') then city_org_id
when '1' = '4' and '2' = '5' then county_org_id
else Internal_Org_Id
end
)
and org.Int_Org_level_cd in ('1','2')
)
and org.Int_Org_Stru_Type_Cd='1'
and org.Int_Org_Type_Cd in ('01','07','10','05','99')
and org.Start_Dt <= to_date('20161001','yyyymmdd')
and org.End_Dt > to_date('20161001','yyyymmdd')
group by 1,2,3;

查询比较慢我们首先查看执行计划:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
GroupAggregate  (cost=14409317.97..14409320.04 rows=18 width=622) (actual time=472510.542..472510.766 rows=37 loops=1)
Group Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COALESCE
(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
-> Sort (cost=14409317.97..14409318.02 rows=18 width=622) (actual time=472510.519..472510.542 rows=37 loops=1)
Sort Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COA
LESCE(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
Sort Method: quicksort Memory: 34kB
-> Merge Left Join (cost=14404111.77..14409317.60 rows=18 width=622) (actual time=472016.735..472510.437 rows=37 loops=
1)
Merge Cond: (a.internal_org_id = syt.stat_org_id)
-> Sort (cost=68934.45..68934.49 rows=18 width=76) (actual time=452.103..452.133 rows=37 loops=1)
Sort Key: a.internal_org_id
Sort Method: quicksort Memory: 29kB
-> Gather (cost=1000.42..68934.07 rows=18 width=76) (actual time=131.356..451.966 rows=37 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Left Join (cost=0.42..67932.27 rows=8 width=76) (actual time=58.910..398.641 rows=12 l
oops=3)
-> Parallel Seq Scan on t98_int_org_app_rela_h a (cost=0.00..67864.63 rows=8 width=51) (actual
time=58.773..398.006 rows=12 loops=3)
Filter: ((int_org_stru_type_cd = '1'::bpchar) AND (start_dt <= to_date('20161001'::text, 'y
yyymmdd'::text)) AND (end_dt > to_date('20161001'::text, 'yyyymmdd'::text)) AND (int_org_type_cd = ANY ('{01,07,10,05,99}'::bpchar
[])) AND ((internal_org_id = '999999888'::bpchar) OR (parent_int_org_id = '999999888'::bpchar) OR (('999999888'::bpchar = CASE WHE
N ('2'::text = ANY ('{2,3}'::text[])) THEN nation_org_id ELSE internal_org_id END) AND (int_org_level_cd = ANY ('{1,2}'::bpchar[])
))))
Rows Removed by Filter: 549182
-> Index Scan using idx_3 on t04_sys_organization org3 (cost=0.42..8.44 rows=1 width=45) (actua
l time=0.041..0.042 rows=1 loops=37)
Index Cond: ((a.new_org_id = internal_org_id) AND (sys_id = 'S20'::character(3)))
Filter: (province_cd = 71)
-> GroupAggregate (cost=14335177.32..14339939.28 rows=35483 width=586) (actual time=471564.595..472002.661 rows=5
6831 loops=1)
Group Key: syt.stat_org_id
-> Sort (cost=14335177.32..14335317.84 rows=56208 width=100) (actual time=471564.558..471606.793 rows=56832
loops=1)
Sort Key: syt.stat_org_id
Sort Method: quicksort Memory: 9529kB
-> Gather (cost=1000.00..14330742.93 rows=56208 width=100) (actual time=70985.797..471057.045 rows=56
833 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t98_syt_acct_stat_d_1 syt (cost=0.00..14324122.13 rows=23420 width=100)
(actual time=71007.859..471086.192 rows=18944 loops=3)
Filter: ((stat_period_cd = '1'::bpchar) AND (stat_org_attr_cd = '9'::bpchar) AND (summ_date
= to_date('20161001'::text, 'yyyymmdd'::text)))
Rows Removed by Filter: 186677461
Planning time: 2.688 ms
Execution time: 472512.890 ms

通过执行计划看到t98_syt_acct_stat_d_1表并没有走到索引(其实表上是有索引),通过调整random_page_cost,seq_page_cost让优化器更偏向于走索引,得到的执行计划如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
GroupAggregate  (cost=9913849.82..9913851.89 rows=18 width=622) (actual time=166908.251..166908.659 rows=37 loops=1)
Group Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COALESCE
(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
-> Sort (cost=9913849.82..9913849.86 rows=18 width=622) (actual time=166908.218..166908.259 rows=37 loops=1)
Sort Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COA
LESCE(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
Sort Method: quicksort Memory: 34kB
-> Nested Loop Left Join (cost=1.43..9913849.44 rows=18 width=622) (actual time=138.763..166907.954 rows=37 loops=1)
-> Merge Left Join (cost=1.00..9913805.16 rows=18 width=627) (actual time=138.717..166905.751 rows=37 loops=1)
Merge Cond: (a.internal_org_id = syt.stat_org_id)
-> Index Scan using idx_2 on t98_int_org_app_rela_h a (cost=0.43..74380.19 rows=18 width=51) (actual time=1
4.101..2392.532 rows=37 loops=1)
Index Cond: (int_org_stru_type_cd = '1'::bpchar)
Filter: ((start_dt <= to_date('20161001'::text, 'yyyymmdd'::text)) AND (end_dt > to_date('20161001'::te
xt, 'yyyymmdd'::text)) AND (int_org_type_cd = ANY ('{01,07,10,05,99}'::bpchar[])) AND ((internal_org_id = '999999888'::bpchar) OR
(parent_int_org_id = '999999888'::bpchar) OR (('999999888'::bpchar = CASE WHEN ('2'::text = ANY ('{2,3}'::text[])) THEN nation_org
_id ELSE internal_org_id END) AND (int_org_level_cd = ANY ('{1,2}'::bpchar[])))))
Rows Removed by Filter: 415239
-> GroupAggregate (cost=0.58..9838981.24 rows=35483 width=586) (actual time=69.743..164300.902 rows=56831 l
oops=1)
Group Key: syt.stat_org_id
-> Index Scan using idx_1 on t98_syt_acct_stat_d_1 syt (cost=0.58..9834359.79 rows=56208 width=100) (
actual time=45.638..163209.973 rows=56832 loops=1)
Index Cond: ((summ_date = to_date('20161001'::text, 'yyyymmdd'::text)) AND (stat_period_cd = '1':
:bpchar) AND (stat_org_attr_cd = '9'::bpchar))
-> Index Scan using idx_3 on t04_sys_organization org3 (cost=0.42..2.45 rows=1 width=45) (actual time=0.041..0.04
3 rows=1 loops=37)
Index Cond: ((a.new_org_id = internal_org_id) AND (sys_id = 'S20'::character(3)))
Filter: (province_cd = 71)
Planning time: 2.604 ms
Execution time: 166909.087 ms

可以看到通过走索引后时间减少了一半多,仔细观察执行计划在此处COST值消耗较大并且后续还有一个Merge Left Join,子查询数据返回太多导致后面做Merge时效率也很低。t98_syt_acct_stat_d_1这个表数据量最大5.8亿

1
2
3
4
Index Scan using idx_1 on t98_syt_acct_stat_d_1 syt  (cost=0.58..9834359.79 rows=56208 width=100) (
actual time=45.638..163209.973 rows=56832 loops=1)
Index Cond: ((summ_date = to_date('20161001'::text, 'yyyymmdd'::text)) AND (stat_period_cd = '1':
:bpchar) AND (stat_org_attr_cd = '9'::bpchar))

解决方案

我们需要较少子查询返回的结果集,观察SQL语句中的这部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
on org.Internal_Org_Id=data.Stat_Org_Id
where (org.Internal_Org_Id = '999999888'
OR ORG.Parent_Int_Org_Id = '999999888'
or '999999888' =
(
case
when '1' = '1' and '2' in ('2','3') then nation_org_id
when '1' = '2' and '2' in ('3','4','5') then prov_org_id
when '1' = '3' and '2' in ('4','5') then city_org_id
when '1' = '4' and '2' = '5' then county_org_id
else Internal_Org_Id
end
)
and org.Int_Org_level_cd in ('1','2')
)
and org.Int_Org_Stru_Type_Cd='1'
and org.Int_Org_Type_Cd in ('01','07','10','05','99')
and org.Start_Dt <= to_date('20161001','yyyymmdd')
and org.End_Dt > to_date('20161001','yyyymmdd')

org.Internal_Org_Id=data.Stat_Org_Id , 后面的where条件是org表的过滤条件,我们将此过滤条件放到子查询中减少子查询返回的结果集,这样可以提高效率
改写后的SQL及执行计划:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
 select
org.Internal_Org_Id as Stat_Org_Id --机构编号
,(case when org.Internal_Org_Id = '999999888'
then '合计'
else org.Org_Name
end ) as Stat_Org_Name --机构代码
,(case when org.Internal_Org_Id = '999999888'
then 1
else 2
end) as ORDERID
,sum(data.jon_acct_num) as jon_acct_num
,sum(data.jon_cross_num) as jon_cross_num
,sum(data.jon_pay_num) as jon_pay_num
,sum(data.total_jon_num) as total_jon_num
,sum(data.des_acct_num) as des_acct_num
,sum(data.des_cross_num) as des_cross_num
,sum(data.des_pay_num) as des_pay_num
,sum(data.total_des_num) as total_des_num
,sum(data.accjon_acct_num) as accjon_acct_num
,sum(data.accjon_cross_num) as accjon_cross_num
,sum(data.accjon_pay_num) as accjon_pay_num
,sum(data.total_accjon_num) as total_accjon_num
,sum(data.accjon_acct_amt) as accjon_acct_amt
,sum(data.accjon_cross_amt) as accjon_cross_amt
,sum(data.accjon_acct1_amt) as accjon_acct1_amt
,sum(data.accjon_acct0_amt) as accjon_acct0_amt
,sum(data.accjon_pay_amt) as accjon_pay_amt
,sum(data.total_accjon_amt) as total_accjon_amt
from cimfbview.t98_int_org_app_rela_h org
left join
(
select
syt.Stat_Org_Id as Stat_Org_Id
,sum(syt.jon_acct_num) as jon_acct_num
,sum(syt.jon_cross_num) as jon_cross_num
,sum(syt.jon_pay_num) as jon_pay_num
,sum(syt.total_jon_num) as total_jon_num
,sum(syt.des_acct_num) as des_acct_num
,sum(syt.des_cross_num) as des_cross_num
,sum(syt.des_pay_num) as des_pay_num
,sum(syt.total_des_num) as total_des_num
,sum(syt.accjon_acct_num) as accjon_acct_num
,sum(syt.accjon_cross_num) as accjon_cross_num
,sum(syt.accjon_pay_num) as accjon_pay_num
,sum(syt.total_accjon_num) as total_accjon_num
,sum(syt.accjon_acct_amt) as accjon_acct_amt
,sum(syt.accjon_cross_amt) as accjon_cross_amt
,sum(syt.accjon_acct1_amt) as accjon_acct1_amt
,sum(syt.accjon_acct0_amt) as accjon_acct0_amt
,sum(syt.accjon_pay_amt) as accjon_pay_amt
,sum(syt.total_accjon_amt) as total_accjon_amt
from T98_SYT_ACCT_STAT_D_1 syt
where syt.Stat_Period_Cd = '1'
and syt.Summ_Date = to_date('20161001','yyyymmdd')
and syt.Stat_Org_Attr_Cd = '9'
and syt.Stat_Org_Id in (
select distinct(Internal_Org_Id) from t98_int_org_app_rela_h org where org.Internal_Org_Id = '999999888'
OR ORG.Parent_Int_Org_Id = '999999888'
or '999999888' =
(
case
when '1' = '1' and '2' in ('2','3') then nation_org_id
when '1' = '2' and '2' in ('3','4','5') then prov_org_id
when '1' = '3' and '2' in ('4','5') then city_org_id
when '1' = '4' and '2' = '5' then county_org_id
else Internal_Org_Id
end
)
and org.Int_Org_level_cd in ('1','2')
and org.Int_Org_Stru_Type_Cd='1'
and org.Int_Org_Type_Cd in ('01','07','10','05','99')
and org.Start_Dt <= to_date('20161001','yyyymmdd')
and org.End_Dt > to_date('20161001','yyyymmdd')

)
group by 1
) data
on org.Internal_Org_Id=data.Stat_Org_Id
where (org.Internal_Org_Id = '999999888'
OR ORG.Parent_Int_Org_Id = '999999888'
or '999999888' =
(
case
when '1' = '1' and '2' in ('2','3') then nation_org_id
when '1' = '2' and '2' in ('3','4','5') then prov_org_id
when '1' = '3' and '2' in ('4','5') then city_org_id
when '1' = '4' and '2' = '5' then county_org_id
else Internal_Org_Id
end
)
and org.Int_Org_level_cd in ('1','2')
)
and org.Int_Org_Stru_Type_Cd='1'
and org.Int_Org_Type_Cd in ('01','07','10','05','99')
and org.Start_Dt <= to_date('20161001','yyyymmdd')
and org.End_Dt > to_date('20161001','yyyymmdd')
group by 1,2,3;

GroupAggregate (cost=140463.35..140465.42 rows=18 width=622) (actual time=548.831..549.113 rows=37 loops=1)
Group Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COALESCE
(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
-> Sort (cost=140463.35..140463.40 rows=18 width=622) (actual time=548.789..548.817 rows=37 loops=1)
Sort Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COA
LESCE(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
Sort Method: quicksort Memory: 34kB
-> Nested Loop Left Join (cost=69906.79..140462.98 rows=18 width=622) (actual time=520.434..548.664 rows=37 loops=1)
Join Filter: (a.internal_org_id = data.stat_org_id)
Rows Removed by Join Filter: 669
-> Gather (cost=1000.42..68934.07 rows=18 width=76) (actual time=63.704..89.506 rows=37 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Left Join (cost=0.42..67932.27 rows=8 width=76) (actual time=61.970..389.321 rows=12 loops=3
)
-> Parallel Seq Scan on t98_int_org_app_rela_h a (cost=0.00..67864.63 rows=8 width=51) (actual time=6
1.885..388.864 rows=12 loops=3)
Filter: ((int_org_stru_type_cd = '1'::bpchar) AND (start_dt <= to_date('20161001'::text, 'yyyymmd
d'::text)) AND (end_dt > to_date('20161001'::text, 'yyyymmdd'::text)) AND (int_org_type_cd = ANY ('{01,07,10,05,99}'::bpchar[])) A
ND ((internal_org_id = '999999888'::bpchar) OR (parent_int_org_id = '999999888'::bpchar) OR (('999999888'::bpchar = CASE WHEN ('2'
::text = ANY ('{2,3}'::text[])) THEN nation_org_id ELSE internal_org_id END) AND (int_org_level_cd = ANY ('{1,2}'::bpchar[])))))
Rows Removed by Filter: 549182
-> Index Scan using idx_3 on t04_sys_organization org3 (cost=0.42..8.44 rows=1 width=45) (actual time
=0.027..0.029 rows=1 loops=37)
Index Cond: ((a.new_org_id = internal_org_id) AND (sys_id = 'S20'::character(3)))
Filter: (province_cd = 71)
-> Materialize (cost=68906.37..71450.71 rows=292 width=586) (actual time=12.309..12.389 rows=19 loops=37)
-> Subquery Scan on data (cost=68906.37..71449.25 rows=292 width=586) (actual time=455.410..457.807 rows=40
loops=1)
-> GroupAggregate (cost=68906.37..71446.33 rows=292 width=586) (actual time=455.407..457.739 rows=40
loops=1)
Group Key: syt.stat_org_id
-> Nested Loop (cost=68906.37..71416.40 rows=292 width=100) (actual time=455.322..457.390 rows=
41 loops=1)
-> Unique (cost=68905.79..68907.25 rows=291 width=10) (actual time=455.259..456.424 rows=
44 loops=1)
-> Sort (cost=68905.79..68906.52 rows=292 width=10) (actual time=455.255..455.810 r
ows=710 loops=1)
Sort Key: org.internal_org_id
Sort Method: quicksort Memory: 58kB
-> Gather (cost=1000.00..68893.83 rows=292 width=10) (actual time=0.683..453.
986 rows=725 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t98_int_org_app_rela_h org (cost=0.00..67864.63
rows=122 width=10) (actual time=2.719..448.507 rows=242 loops=3)
Filter: ((internal_org_id = '999999888'::bpchar) OR (parent_int_org
_id = '999999888'::bpchar) OR (('999999888'::bpchar = CASE WHEN ('2'::text = ANY ('{2,3}'::text[])) THEN nation_org_id ELSE intern
al_org_id END) AND (int_org_level_cd = ANY ('{1,2}'::bpchar[])) AND (int_org_stru_type_cd = '1'::bpchar) AND (int_org_type_cd = AN
Y ('{01,07,10,05,99}'::bpchar[])) AND (start_dt <= to_date('20161001'::text, 'yyyymmdd'::text)) AND (end_dt > to_date('20161001'::
text, 'yyyymmdd'::text))))
Rows Removed by Filter: 548953
-> Index Scan using idx_1 on t98_syt_acct_stat_d_1 syt (cost=0.58..8.60 rows=1 width=100)
(actual time=0.016..0.017 rows=1 loops=44)
Index Cond: ((stat_org_id = org.internal_org_id) AND (summ_date = to_date('20161001':
:text, 'yyyymmdd'::text)) AND (stat_period_cd = '1'::bpchar) AND (stat_org_attr_cd = '9'::bpchar))
Planning time: 2.997 ms
Execution time: 565.724 ms

执行时间缩小到不到1秒,通过执行计划看到SQL执行顺序发生了变化,t98_syt_acct_stat_d_1不是最先被执行的,而是先过滤完internal_org_id,再根据internal_org_id与t98_syt_acct_stat_d_1上的stat_org_id做 Nested Loop
还有一种方式可以将子查询提出来改成连接,这种方式SQL语句看起来比较清爽, 效率也不错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
select
org.Internal_Org_Id as Stat_Org_Id --机构编号
,(case when org.Internal_Org_Id = '999999888'
then '合计'
else org.Org_Name
end ) as Stat_Org_Name --机构代码
,(case when org.Internal_Org_Id = '999999888'
then 1
else 2
end) as ORDERID
,sum(data.jon_acct_num ) as jon_acct_num
,sum(data.jon_cross_num ) as jon_cross_num
,sum(data.jon_pay_num ) as jon_pay_num
,sum(data.total_jon_num ) as total_jon_num
,sum(data.des_acct_num ) as des_acct_num
,sum(data.des_cross_num ) as des_cross_num
,sum(data.des_pay_num ) as des_pay_num
,sum(data.total_des_num ) as total_des_num
,sum(data.accjon_acct_num ) as accjon_acct_num
,sum(data.accjon_cross_num ) as accjon_cross_num
,sum(data.accjon_pay_num ) as accjon_pay_num
,sum(data.total_accjon_num ) as total_accjon_num
,sum(data.accjon_acct_amt ) as accjon_acct_amt
,sum(data.accjon_cross_amt ) as accjon_cross_amt
,sum(data.accjon_acct1_amt ) as accjon_acct1_amt
,sum(data.accjon_acct0_amt ) as accjon_acct0_amt
,sum(data.accjon_pay_amt ) as accjon_pay_amt
,sum(data.total_accjon_amt ) as total_accjon_amt
from cimfbview.t98_int_org_app_rela_h org
left join T98_SYT_ACCT_STAT_D_1 data
on org.Internal_Org_Id=data.Stat_Org_Id
and data.Stat_Period_Cd = '1'
and data.Summ_Date = to_date('20161001','yyyymmdd')
and data.Stat_Org_Attr_Cd = '9'
where (org.Internal_Org_Id = '999999888'
OR ORG.Parent_Int_Org_Id = '999999888'
or '999999888' =
(
case
when '1' = '1' and '2' in ('2','3') then nation_org_id
when '1' = '2' and '2' in ('3','4','5') then prov_org_id
when '1' = '3' and '2' in ('4','5') then city_org_id
when '1' = '4' and '2' = '5' then county_org_id
else Internal_Org_Id
end
)
and org.Int_Org_level_cd in ('1','2')
)
and org.Int_Org_Stru_Type_Cd='1'
and org.Int_Org_Type_Cd in ('01','07','10','05','99')
and org.Start_Dt <= to_date('20161001','yyyymmdd')
and org.End_Dt > to_date('20161001','yyyymmdd')
group by 1,2,3
order by 1,2,3;


GroupAggregate (cost=69003.44..69005.51 rows=18 width=622) (actual time=401.488..401.925 rows=37 loops=1)
Group Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COALESCE
(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
-> Sort (cost=69003.44..69003.48 rows=18 width=136) (actual time=401.431..401.487 rows=37 loops=1)
Sort Key: a.internal_org_id, (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN '合计'::character varying ELSE COA
LESCE(org3.org_name, a.org_name) END), (CASE WHEN (a.internal_org_id = '999999888'::bpchar) THEN 1 ELSE 2 END)
Sort Method: quicksort Memory: 34kB
-> Gather (cost=1001.00..69003.06 rows=18 width=136) (actual time=61.302..401.261 rows=37 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Left Join (cost=1.00..68001.21 rows=8 width=136) (actual time=56.932..395.159 rows=12 loops=3)
-> Nested Loop Left Join (cost=0.42..67932.27 rows=8 width=76) (actual time=56.841..394.613 rows=12 loops=3
)
-> Parallel Seq Scan on t98_int_org_app_rela_h a (cost=0.00..67864.63 rows=8 width=51) (actual time=5
6.745..394.173 rows=12 loops=3)
Filter: ((int_org_stru_type_cd = '1'::bpchar) AND (start_dt <= to_date('20161001'::text, 'yyyymmd
d'::text)) AND (end_dt > to_date('20161001'::text, 'yyyymmdd'::text)) AND (int_org_type_cd = ANY ('{01,07,10,05,99}'::bpchar[])) A
ND ((internal_org_id = '999999888'::bpchar) OR (parent_int_org_id = '999999888'::bpchar) OR (('999999888'::bpchar = CASE WHEN ('2'
::text = ANY ('{2,3}'::text[])) THEN nation_org_id ELSE internal_org_id END) AND (int_org_level_cd = ANY ('{1,2}'::bpchar[])))))
Rows Removed by Filter: 549182
-> Index Scan using idx_3 on t04_sys_organization org3 (cost=0.42..8.44 rows=1 width=45) (actual time
=0.026..0.027 rows=1 loops=37)
Index Cond: ((a.new_org_id = internal_org_id) AND (sys_id = 'S20'::character(3)))
Filter: (province_cd = 71)
-> Index Scan using idx_1 on t98_syt_acct_stat_d_1 data (cost=0.58..8.60 rows=1 width=100) (actual time=0.0
34..0.036 rows=1 loops=37)
Index Cond: ((a.internal_org_id = stat_org_id) AND (summ_date = to_date('20161001'::text, 'yyyymmdd'::t
ext)) AND (stat_period_cd = '1'::bpchar) AND (stat_org_attr_cd = '9'::bpchar))
Planning time: 4.092 ms
Execution time: 412.283 ms

总结

优化就是减少对数据扫描次数,子查询或连接查询时应尽量将结果集在底层减少数据结果

观察执行计划要看按执行顺序查看,这样才会发现哪一步执行计划出现了问题、返回的数据量最大、COST值消耗最大

数据库都是相通的这里数据库是PG但MySQL优化思路也一样。

PostgreSQL的PIRT恢复

基于备份集的恢复比较简单,直接使用备份进行恢复,但是在实际环境中我们可能会遇到数据误删除需要指定恢复到某一时间点

实验步骤

总体步骤如下:

  • 基于数据库做基础备份
  • 插入新数据
  • 删除表中部分数据
  • 基于备份做指定时间点的恢复
  • 验证数据是否恢复

基础备份

  • 使用pg_basebackup
    1
    2
    3
    4
    5
    6
    pg_basebackup  -X s -P -F t -U postgres -D /backup/

    [postgres@pg39 backup]$ ls -ltr
    total 40104
    -rw-rw-r-- 1 postgres postgres 24284160 May 6 01:47 base.tar
    -rw------- 1 postgres postgres 16779264 May 6 01:47 pg_wal.tar
    备份完成后会看到生成两个tar文件,其中pg_wal是由于备份时添加了-X s参数的原因

插入新数据

  • 向dhytest表中插入新数据并记录时间
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    [postgres@pg39 backup]$ psql
    psql (10.3)
    Type "help" for help.

    postgres=# insert into dhytest values (10);
    INSERT 0 1
    postgres=# insert into dhytest values (20);
    INSERT 0 1
    postgres=# insert into dhytest values (30);
    INSERT 0 1
    postgres=# insert into dhytest values (40);
    INSERT 0 1
    postgres=# insert into dhytest values (50);
    INSERT 0 1
    postgres=# select now();
    now
    -------------------------------
    2018-05-06 01:51:52.664137+08

删除数据

  • 删除数据并记录时间点,用于指定时间点恢复时使用
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    postgres=# delete from dhytest where id = 40;
    DELETE 1
    postgres=# select now();
    now
    -------------------------------
    2018-05-06 01:52:31.409248+08
    (1 row)

    [postgres@pg39 backup]$ psql
    psql (10.3)
    Type "help" for help.

    //切换几个wal日志
    postgres=# select pg_switch_wal();
    pg_switch_wal
    ---------------
    0/18000730
    (1 row)

    postgres=# select pg_switch_wal();
    pg_switch_wal
    ---------------
    0/19000078
    (1 row)

基于备份做指定时间点的恢复

  • 解压备份文件
    1
    2
    3
    tar xvf base.tar
    mv pg_wal.tar pg_wal/
    tar xvf pg_wal.tar
  • 创建recovery.conf文件
    1
    2
    recovery_target_time = ' 2018-05-06 01:51:52.664137'
    restore_command = 'cp /pgdata/pg_wal/%f %p' //这里没有做wal的归档,所以直接指向了wal 的目录,如果设置了wal归档则执行归档目录即可

验证数据是否恢复

  • 启动数据库后我们查看数据全部恢复完成
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [postgres@pg39 backup]$ pg_ctl start -D /backup/
    waiting for server to start....2018-05-06 02:01:35.282 CST [4404] LOG: listening on IPv4 address "0.0.0.0", port 5432
    2018-05-06 02:01:35.282 CST [4404] LOG: listening on IPv6 address "::", port 5432
    2018-05-06 02:01:35.285 CST [4404] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
    2018-05-06 02:01:35.292 CST [4404] LOG: redirecting log output to logging collector process
    2018-05-06 02:01:35.292 CST [4404] HINT: Future log output will appear in directory "log".
    done
    server started
    [postgres@pg39 backup]$ psql
    psql (10.3)
    Type "help" for help.

    postgres=# select * from dhytest;
    id
    ----
    10
    20
    30
    40
    50
    (5 rows)

    恢复大致原理

    基于时间点恢复主要过程就是利用全备+WAL日志,将数据库推至一个指定的状态。在恢复时我们需要注意WAL日志是否完整,如果归档目录下WAL日志不能恢复到指定时间点需要拷贝最新的WAL日志到归档目录下,才能继续恢复。

解决linux下客户端软件不能输入中文问题

最近安装一套Greenplum集群,安装后使用psql命令创建表插入中文时一直不能输入,最终定位到问题原因是因为libedit.so.0版本原因,下面介绍下排查方式及解决方法

定位问题原因

  • 首先客户端不能输入中文则验证在linux系统层是否可以输入中文, 验证可以
  • 怀疑是客户端自身原因则使用不同版本客户端测试,发现高版本的客户端则可以输入中文
  • 说明原因是在客户端上,查看两个客户端的client encoding也全都是UTF8
  • 通过ldd 查看两个命令引用的动态库
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    //高版本psql客户端
    [gpadmin@mdw ~]$ ldd /usr/bin/psql
    linux-vdso.so.1 => (0x00007fff64df2000)
    libpq.so.5 => /usr/local/greenplum/greenplum-db/./lib/libpq.so.5 (0x00007f0ea263f000)
    libssl.so.10 => /usr/lib64/libssl.so.10 (0x000000393f200000)
    libreadline.so.6 => /lib64/libreadline.so.6 (0x0000003937600000)

    //低版本客户端
    [gpadmin@mdw ~]$ ldd /usr/local/greenplum/greenplum-db/bin/psql
    linux-vdso.so.1 => (0x00007fff89b49000)
    libpq.so.5 => /usr/local/greenplum/greenplum-db/./lib/libpq.so.5 (0x00007fcbaca2b000)
    libssl.so.0.9.8 => /usr/local/greenplum/greenplum-db/./lib/libssl.so.0.9.8 (0x00007fcbac7db000)
    libedit.so.0 => /usr/local/greenplum/greenplum-db/./lib/libedit.so.0 (0x00007fcbac5a5000)
    两个客户端处理字符输入引用的动态库是不一样的,分别使用了/lib64/libreadline.so.6, libedit.so.0 我们在查看libedit.so.0的版本
    1
    2
    3
    cd /usr/local/greenplum/greenplum-db/lib
    ls -ltr libedit.so.0
    发现libedit.so.0 是libedit.so.0.0.29的一个软连接, 怀疑是版本问题

升级libedit版本

  • http://distfiles.macports.org/libedit/ 下载 libedit-20170329-3.1.tar.gz

  • 上传到服务器上编译安装

    1
    2
    3
    tar -zxvf libedit-20170329-3.1.tar.gz
    ./configure
    make && make install
  • 将原有的软连接unlink,并创建新连接

    1
    2
    unlink /usr/local/greenplum/greenplum-db/./lib/libedit.so.0
    ln -s /usr/local/lib/libedit.so.0.0.56 /usr/local/greenplum/greenplum-db/./lib/libedit.so.0
  • 再次登录psql客户端发现客户输入中文了

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    postgres=# create table dhytest (id int, name char(20));
    NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
    HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    CREATE TABLE
    postgres=# insert into dhytest values(10, '董红禹');
    INSERT 0 1
    postgres=# select * from dhytest;
    id | name
    ----+-------------------------
    10 | 董红禹
    (1 row)

问题延伸

在安装Greenplum时,会加载相应环境变量Greenplum手册中提供了相应的脚本,一般会添加到gpadmin的.bash_profile中

1
source /usr/local/greenplum/greenplum-db/greenplum_path.sh

加载此环境变量后使用的一些动态库会指向Greenplum安装目录下的lib目录中,这样也会导致在root用户下和gpadmin用户下执行posql引用不同动态库会出现不同结果,这点需要大家注意。

PostgreSQL中添加插件

背景

最近一个客户将SQLServer迁移到PostgreSQL中,性能得到了整体提升并省去每年接近百万的License费用,迁移中用户提出一个需求,需要将以前应用一些加解密的函数封装成PG的函数但是由于之前这些函数引入了C++第三方的动态库我们无法使用数据库中这种函数方式,所以我们通过PG中特有的插件功能来解决。

添加插件流程

文件结构

  • 封装好的C++类库
    1
    2
    CryptoPG.h
    CryptoPG.cpp
  • 对外接口文件
    • 由于编译是发现直接调用C++类库与PG源码中宏定义会有重名问题,所以将C++类库在进行一次包装供C进行调用
      1
      2
      cs_cryptopp.h
      cs_cryptopp.cpp
  • PG中插件实现的源文件
    1
    cs_cryptopp_wrapper.c

    源码实现

    对外结构文件

  • 在cs_cryptopp.cpp文件中我们进行一次封装,调用C++类库中的函数
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    //cs_cryptopp.cpp
    int generateecdsakey_wrapper(char ** strPrivate, char ** strPublic)
    {
    //调用C++函数
    CryptoPG cry;
    cry.GenerateECDSAKey(strPrivate, strPublic);
    return 0;
    }
    //同时在cs_cryptopp.h中声明函数,extern 声明代表以C形式进行编译
    #ifdef __cplusplus
    extern "C" int generateecdsakey_wrapper(char ** strPrivate, char ** strPublic);
    #endif

PG中插件实现文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
//cs_cryptopp_wrapper.c
extern int generateecdsakey_wrapper(char ** strPrivate, char ** strPublic); //引入外部声明函数

PG_FUNCTION_INFO_V1(generateecdsakey); //注册函数,此函数是PG中我们需要使用的函数名

PG_MODULE_MAGIC; //必须要添加证明是PG的模块

/*
* signmessage
*
* 生成电子签名信息
*/
Datum
signmessage(PG_FUNCTION_ARGS){

FuncCallContext *funcctx;
TupleDesc tupdesc;
CallCtx *myCtx;

if (SRF_IS_FIRSTCALL()) {
MemoryContext oldcontext;
funcctx = SRF_FIRSTCALL_INIT();
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

myCtx = (CallCtx *) palloc(sizeof(CallCtx));
myCtx->CallTime = 0;
tupdesc = CreateTemplateTupleDesc(2, false);
TupleDescInitEntry(tupdesc, 1, "Message", TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, 2, "Signature", TEXTOID, -1, 0);
funcctx->user_fctx = myCtx;
funcctx->tuple_desc = BlessTupleDesc(tupdesc);
MemoryContextSwitchTo(oldcontext);
}

funcctx = SRF_PERCALL_SETUP();
myCtx = funcctx->user_fctx;
tupdesc = funcctx->tuple_desc ;

if (myCtx->CallTime == 0) {
char* strPGMessage = PG_GETARG_CSTRING(0); //获取函数传入的值
char* strSignature = NULL ;
signmessage_wrapper(strPGMessage, &strSignature); //调用对应封装好的函数
HeapTuple tuple = NULL;
Datum values[2];
bool isnull[2] = { 0,0 };

//放入返回值
values[0] = CStringGetTextDatum(strPGMessage);
values[1] = CStringGetTextDatum(strSignature);
free(strSignature);
myCtx->CallTime++;
tuple = heap_form_tuple(tupdesc, values, isnull);
SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple)); //返回
//PostgreSQL中针对返回不同类型有专门宏定义 具体请参考源码src/include/fmgr.h文件
} else {
SRF_RETURN_DONE(funcctx);

}
}

SQL文件编写

编写完对应的代码之后我们还需要添加一个SQL文件,这样在PG中 CREATE EXTENSION会执行对应的SQL语句创建

1
2
3
4
5
//创建一个函数对应cs_cryptopp_wrapper.c中函数名
CREATE FUNCTION signmessage(cstring) //接受类型定义为cstring
RETURNS TABLE (Message text, signature text) //返回值我们定义为一个表因为会返回多列
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

编译安装与使用

编译安装

一般PG的插件我们都会放在源码目录下的contrib/项目名称/ 目录中,进入目录执行:

1
2
3
4
5
6
7
make //编译
su - postgres //切换到postgres用户
pg_ctl stop //关闭数据库
exit //退回到root用户
make install //安装
su - postgres //切换到postgres用户
pg_ctl start //启动数据库

使用

登录pg数据库中执行:

1
2
3
4
5
6
7
8
9
postgres=# drop EXTENSION cs_cryptopp;  //如果之前已创建过EXTENSION
postgres=# CREATE EXTENSION cs_cryptopp;
postgres=# select * from signmessage('2018-01-01 测试');
message | signature
-----------------+------------------------------------
2018-01-01 测试 | )\x06Ū +
| \x0E\x05\x0C\x0E +
| SBHLc?)!\x0EÒ:A\x03h\x1C\72n\x13={
(1 row)

参考链接

MySQL统计信息相关表介绍

MySQL中提供了两个表记录统计信息的相关内容,分别是
innodb_table_stats与innodb_index_stats。

innodb_table_stats

这个表里面的内容还是比较好理解

1
2
3
4
5
6
7
[root@shadow:/home/mysql/qdata_for_mysql 5.7.18-log_Instance1 root@localhost:(none) 14:03:33]>select * from mysql.innodb_table_stats where table_name = 'dhytest';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| test | dhytest | 2017-07-13 14:03:32 | 16 | 1 | 1 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.00 sec)

重要的列:

  • last_update 就是最后一次收集统计信息的时间
  • clustered_index_size 聚集索引的page数量
  • sum_of_other_index_sizes 非聚集索引的page数量

通过这些信息我们可以算出聚集索引的大小:

1
2
3
4
5
6
7
[root@shadow:/home/mysql/qdata_for_mysql 5.7.18-log_Instance1 root@localhost:mysql 15:53:31]>select (clustered_index_size*@@innodb_page_size)/1024/1024 as size from mysql.innodb_table_stats where table_name = 'sbtest1';
+--------------+
| size |
+--------------+
| 107.62500000 |
+--------------+
1 row in set (0.00 sec)

innodb_index_stats

这个表里面输出的内容相对会比较复杂一些
表结构和测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE t1 (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f)
) ENGINE=INNODB;

[root@shadow:/home/mysql/qdata_for_mysql 5.7.18-log_Instance1 root@localhost:test 21:17:07]>select * from t1;
+---+---+------+------+------+------+
| a | b | c | d | e | f |
+---+---+------+------+------+------+
| 1 | 1 | 10 | 11 | 100 | 101 |
| 1 | 2 | 10 | 11 | 200 | 102 |
| 1 | 3 | 10 | 11 | 100 | 103 |
| 1 | 4 | 10 | 12 | 200 | 104 |
| 1 | 5 | 10 | 12 | 100 | 105 |
+---+---+------+------+------+------+
5 rows in set (0.00 sec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@shadow:/home/mysql/qdata_for_mysql 5.7.18-log_Instance1 root@localhost:test 21:16:41]>select index_name,stat_name,stat_value,stat_description from mysql.innodb_index_stats where table_name = 't1';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i1 | n_leaf_pages | 1 | Number of leaf pages in the index |
| i1 | size | 1 | Number of pages in the index |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
| i2uniq | n_leaf_pages | 1 | Number of leaf pages in the index |
| i2uniq | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
14 rows in set (0.00 sec)

我们主要关注的的列:

  • stat_value: 显示统计值的大小

  • stat_description:类型的描述

  • stat_name:此列显示统计的类型 , 会出现下面这些:

    • size:此时stat_value显示索引的page数量

    • n_leaf_pages:此时stat_value显示叶子节点的数量

    • n_diff_pfxNN:显示索引字段上唯一值的忽略,这里需要特殊说明:

      • 主键索引与唯一索引 例如上面结果中index_name = PRIMARY时:

        • index_name = PRIMARY 且 stat_name = n_diff_pfx01 则stat_value代表主键索引中第一列distinct之后的数量,从上面的例子我们看到stat_value是1,因为a这一列的值都是(1)
        • index_name = PRIMARY 且 stat_name = n_diff_pfx02 则stat_value代表主键索引中第一列和第二列distinct之后的数量,从上面的例子我们看到stat_value是5,因为a,b两列存在的值是(1,1)(1,2)(1,3)(1,4)(1,5)
        • stat_description中我们可以看到是那几个列的信息
        • n_diff_pfxNN 以此类推
      • 非唯一索引不同的地方是在原有的列之后会添加上主键索引,这样说可能比较难理解,针对上面查询出来的记过下面详细说明下:

        • 根据表结构定义我们知道i1是一个非唯一索引,是由(c,d)两个列组成的。我们根据上面的结果可以看到除了n_diff_pfx01,n_diff_pfx02又多出来了n_diff_pfx03,n_diff_pfx04 ,通过stat_description我们可以看到n_diff_pfx03,n_diff_pfx04是在原有的(c,d)两列上又多出了 (c,d,a) (c,d,a,b) 这里就是将主键索引添加到了这里。

        • 例如 n_diff_pfx03 的stat_value是2 代表的就是在原有的非唯一索引上添加了主键索引的第一列(a), 这个时候distinct之后的值是2 所存在的值就是: (10,11,1) (10,12,1)

通过这个表我们可以查看索引选择性如何,并且可以看到组合索引中每一列选择性如何,还可以计算索引的大小:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT

stat_value pages,
index_name,
(stat_value * @@innodb_page_size)/1024/1024 size
FROM
mysql.innodb_index_stats
WHERE
table_name = 'sbtest1'
AND database_name = 'sbtest'
AND stat_description = 'Number of pages in the index'
AND stat_name = 'size'
GROUP BY
index_name;

如果是分区表则使用下面的语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
stat_value pages,
index_name,
(sum(stat_value) * @@innodb_page_size)/1024/1024 size
FROM
mysql.innodb_index_stats
WHERE
table_name like 't#P%'
AND database_name = 'test'
AND stat_description = 'Number of pages in the index'
AND stat_name = 'size'
GROUP BY
index_name;