微信号:iMySQL_WX

介绍:叶金荣,ORACLE MySQL ACE,专注MySQL十余年,分享MySQL技术及工作心得. 个人站点:http://imysql.com,QQ群:272675472. ACMUG(http://acmug.com)官方指定运营公众号.

MySQL 查询性能优化:不只是索引

2018-03-03 12:48 魏新平

导读

译者魏新平

知数堂第5期MySQL实战班学员,第10期MySQL优化班学员,现任职助教。

原文出处:https://www.percona.com/blog/2018/01/30/is-indexing-always-the-key-to-mysql-query-performance/

原文作者:Peter Zaitsev 

在本文当中,我将研究优化索引是否总是提高MySQL查询性能的关键。(剧透一下,不是)

In this blog post, I’ll look at whether optimizing indexing is always the key to improving MySQL query performance (spoiler, it isn’t).

让我们看一下下面这个例子:

Let’s look at this query for illustration:

mysql> show create table tbl G *************************** 1. row ***************************      Table: tbl Create Table: CREATE TABLE `tbl` ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `g` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************          id: 1 select_type: SIMPLE       table: tbl  partitions: NULL        type: ALL possible_keys: k_1         key: NULL     key_len: NULL         ref: NULL        rows: 998490    filtered: 50.00       Extra: Using where; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | | 153286 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

当看到这样的语句,很多人会认为主要的原因是全表扫描。那么有人就会奇怪了,为什么MySQL的优化器不使用索引呢(那是因为该语句查询的数据不具备选择性,简单点说就是获取的数据相对于整张表来说太多了)这样的想法就会导致有些人强制使用索引,但是效果只会更差。

Looking at this query, many might assume the main problem is that this query is doing a full table scan. One could wonder then, “Why does the MySQL optimizer not use index (k)?” (It is because the clause is not selective enough, by the way.) This thought might cause someone to force using the index, and get even worse performance。

mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (9.37 sec)

或许有些人会把基于k列的单列索引扩展成基于k和g列的联合索引。事实上,这样并不会有任何效果。

Or someone might extend the index on (k) to (k,g) to be a covering index for this query. This won’t improve performance either:

mysql> alter table tbl drop key k_1, add key(k,g); Query OK, 0 rows affected (5.35 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************          id: 1 select_type: SIMPLE       table: tbl  partitions: NULL        type: range possible_keys: k         key: k     key_len: 4         ref: NULL        rows: 499245    filtered: 100.00       Extra: Using where; Using index; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 915436 |  8 | | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

上述两种错误的优化思路完全是因为我们一直在错误的方向浪费精力:即迅速获取满足k<1000000的行。然而真正的问题并不是快速获取这些数据。如果我们把GROUP BY子句去掉,我们会的发现执行速度快了惊人的10倍。

This wasted effort is all due to focusing on the wrong thing: figuring out how can we find all the rows that match k<1000000 as soon as possible. This is not the problem in this case. In fact, the query that touches all the same columns but doesn’t use GROUP BY runs 10 times as fast:

mysql> select sum(g) from tbl where k<1000000; +--------------+ | sum(g)       | +--------------+ | 500383719481 | +--------------+ 1 row in set (0.68 sec)

针对这个特殊的语句,是否使用索引获取数据并不是主要的问题,我们应该关注于如何优化GROUP BY,这才是问题的所在。

For this particular query, whether or not it is using the index for lookup should not be the main question. Instead, we should look at how to optimize GROUP BY – which is responsible for some 90% of the query response time.

在我的下篇博客当中,我将介绍4种MySQL执行GROUP BY的方法,来帮助更好的优化类似语句。

In my next blog post, I will write about four ways to execute the MySQL GROUP BY operation to provide further help on optimizing these queries.


扫码加入知数堂技术交流QQ群

(群号:579036588)

群内可@各位助教了解更多课程信息




知数堂

叶金荣与吴炳锡联合打造

领跑IT精英培训

行业资深专家强强联合,倾心定制

MySQL实战/MySQL优化/大数据实战 / Python/ SQL优化

数门精品课程

紧随技术发展趋势,定期优化培训教案

融入大量生产案例,贴合企业一线需求

社群陪伴学习,一次报名,可学3期

DBA、开发工程师必修课

上千位学员已华丽转身,薪资翻番,职位提升

改变已悄然发生,你还在等什么?

扫码下载知数堂精品课程试听视频

或点击“阅读原文”直达下载地址

(MySQL 实战/优化、大数据实战、Python开发,及SQL优化等课程)

密码:hg3h






 
老叶茶馆 更多文章 周四见|《时序数据库InfluxDB原理及应用初探》by晓伟 周四见|《时序数据库InfluxDB原理及应用初探》by晓伟 使用MySQL 5.7虚拟列提高查询效率 如何更快随机UPDATE? 简单几招提高MySQL安全性
猜您喜欢 Xcode装B插件:打字震屏幕火花效果 孙元浩将在2016中国(上海)大数据产业创新峰会发表主题演讲 Github热度周排行第一,阿里Java规约插件开发之路曝光 IJCAI16论文速读:Deep Learning论文选读(上) 【rocksdb系列】如何在机械盘下使用rocksdb