一. 索引

1. 索引的定义

索引是一种单独的、物理的对数据库表中一列或多列的值进行排序的一种存储结构，它是某个表中一列或若干列值的集合和相应的指向表中物理标识这些值的数据页的逻辑指针清单。索引的作用相当于图书的目录，可以根据目录中的页码快速找到所需的内容。

2. 二分查找

二分查找法（binary search）也称为折半查找法，用来查找一组有序的记录数组中的某一记录，其基本思想是：将记录按有序化（递增或递减）排列，在查找过程中采用跳跃式方式查找，即先以有序数列的中点位置作为比较对象，如果要找的元素值小于该中点元素，则将待查序列缩小为左半部分，否则为右半部分。通过一次比较，将查找区间缩小一半。（O(logN)的时间复杂度）

# 给出一个例子，注意该例子已经是升序排序的，且查找 数字 48

数据：5， 10， 19， 21， 31， 37， 42， 48， 50， 52

下标：0，  1，  2，  3，  4，  5，  6，  7，  8，  9

步骤一：设low为下标最小值0，high为下标最大值9;
步骤二：通过low和high得到mid，mid=(low + high)/ 2，初始时mid为下标4(也可以=5，看具体算法实现)；
步骤三：mid=4对应的数据值是31，31 < 48（我们要找的数字）；
步骤四：通过二分查找的思路，将low设置为31对应的下标4，high保持不变为9，此时mid为6；
步骤五：mid=6对应的数据值是42，42 < 48（我们要找的数字）；
步骤六：通过二分查找的思路，将low设置为42对应的下标6，high保持不变为9，此时mid为7；
步骤七：mid=7对应的数据值是48，48 == 48（我们要找的数字），查找结束；

通过3次二分查找就找到了我们所要的数字，而顺序查找需8次.

二. Binary Tree(二叉树)

二叉树 wiki

1. 二叉树的定义

每个节点至多只有二棵子树；
二叉树的子树有左右之分，次序不能颠倒；
一棵深度为k，且有 $2^k-1$ 个节点，称为满二叉树(Full Tree)；
一棵深度为k，且root到k-1层的节点树都达到最大，第k层的所有节点都连续集中在最左边，此时为完全二叉树（Complete Tree）

015：索引、B+树

2. 平衡二叉树（AVL-树）

平衡二叉树 wiki

左子树和右子树都是平衡二叉树；
左子树和右子树的高度差绝对值不超过1；
- 平衡二叉树
- 非平衡二叉树
平衡二叉树的遍历

以上面平衡二叉树的图例为样本，进行遍历：

前序：6, 3, 2, 5, 7, 8（ROOT节点在开头, 中-左-右顺序）
中序：2, 3, 5, 6, 7, 8（中序遍历即为升序，左-中-右顺序）
后序：2, 5, 3, 8, 7, 6（ROOT节点在结尾，左-右-中 顺序）

1：可以通过前序和中序 或者是 后序和中序来推导出一个棵树

2：前序或者后序用来得到ROOT节点，中序可以区分左右子树

平衡二叉树的旋转

015：索引、B+树

需要通过旋转（左旋，右旋）来维护平衡二叉树的平衡，在添加和删除的时候需要有额外的开销。

三. B树/B+树

B树 wiki介绍

B+树 wiki介绍

注意:B树和B+树开头的B不是Binary，而是Balance

3.1. B树的定义

阶为M (节点上关键字(Keys)的个数) 的B树的定义：

每个节点最多有M个孩子；
除了root节点外，每个非叶子(non-leaf)节点至少含有(M/2)个孩子；
如果root节点不为空，则root节点至少要有两个孩子节点；
一个非叶子(non-leaf)节点如果含有K个孩子，则包含k-1个keys；
所有叶子节点都在同一层；
B树中的非叶子(non-leaf)节点也包含了数据部分；

3.2. B+树的定义

在B树的基础上，B+树做了如下改进

数据只存储在叶子节点上，非叶子节点只保存索引信息；
- 非叶子节点（索引节点）存储的只是一个Flag，不保存实际数据记录；
- 索引节点指示该节点的左子树比这个Flag小，而右子树大于等于这个Flag
叶子节点本身按照数据的升序排序进行链接(串联起来)；
- 叶子节点中的数据在物理存储上是无序的，仅仅是在逻辑上有序（通过指针串在一起）；

3.3. B+树的作用

在块设备上，通过B+树可以有效的存储数据；
所有记录都存储在叶子节点上，非叶子(non-leaf)存储索引(keys)信息；
B+树含有非常高的扇出（fanout），通常超过100，在查找一个记录时，可以有效的减少IO操作；

3.4. B+树的操作

B+树的插入
- B+树的插入必须保证插入后叶子节点中的记录依然排序。
- 插入操作步骤

015：索引、B+树

>B+树总是会保持平衡。但是为了保持平衡对于新插入的键值可能需要做大量的拆分页（split）操作；部分情况下可以通过B+树的旋转来替代拆分页操作，进而达到平衡效果。

B+树的删除
- B+树使用填充因子（fill factor）来控制树的删除变化,50%是填充因子可设的最小值。B+树的删除操作同样必须保证删除后叶子节点中的记录依然排序。与插入不同的是，删除根据填充因子的变化来衡量。
- 删除操作步骤

015：索引、B+树

3.5. B+树的扇出(fan out)

B+树图例

015：索引、B+树

该 B+ 树高度为 2
每叶子页（LeafPage）4条记录
扇出数为5
叶子节点(LeafPage)由小到大（有序）串联在一起

扇出 是每个索引节点(Non-LeafPage)指向每个叶子节点(LeafPage)的指针

扇出数 = 索引节点(Non-LeafPage)可存储的最大关键字个数 + 1

图例中的索引节点(Non-LeafPage)最大可以存放4个关键字，但实际使用了3个；

3.6. B+树存储数据举例

假设B+树中页的大小是16K，每行记录是200Byte大小，求出树的高度为1，2，3，4时，分别可以存储多少条记录。

查看数据表中每行记录的平均大小

mysql gcdb@localhost:employees> show table status like '%employees%'\G;

***************************[ 1. row ]***************************

Name            | employees

Engine          | InnoDB

Version         | 10

Row_format      | Dynamic

Rows            | 299423

Avg_row_length  | 50            --平均一行记录

Data_length     | 15220736

Max_data_length | 0

Index_length    | 0

Data_free       | 4194304

Auto_increment  | <null>

Create_time     | 2017-12-06 16:59:37

Update_time     | 2017-12-06 16:59:43

Check_time      | <null>

Collation       | utf8_general_ci

Checksum        | <null>

Create_options  |

Comment         |

高度为1
- 16K/200B 约等于 80 个记录（数据结构元信息如指针等忽略不计）
高度为2

非叶子节点中存放的仅仅是一个索引信息，包含了Key和Point指针；Point指针在MySQL中固定为6Byte。而Key我们这里假设为8Byte，则单个索引信息即为14个字节，KeySize = 14Byte
- 高度为2，即有一个索引节点（索引页），和N个叶子节点
- 一个索引节点可以存放 16K / KeySize = 16K / 14B = 1142个索引信息，即有（1142 + 1）个扇出，以及有（1142 + 1）个叶子节点（数据页） （可以简化为1000）
- 数据记录数 = （16K / KeySize + 1）x (16K / 200B) 约等于 80W 个记录
高度为3

高度为3的B+树，即ROOT节点有1000个扇出，每个扇出又有1000个扇出指向叶子节点。每个节点是80个记录，所以一共有 8000W个记录
高度为4

同高度3一样，高度为4时的记录书为（8000 x 1000）W

上述的8000W等数据只是一个理论值。线上实际使用单个页的记录数字要乘以70%，即第二层需要70% x 70% ，依次类推。

因此在数据库中，B+树的高度一般都在2～4层，这也就是说查找某一键值的行记录时最多只需要2到4次IO，2～4次的IO意味着查询时间只需0.02～0.04秒（假设IOPS=100，当前SSD可以达到50000IOPS）。

从5.7开始，页的预留大小可以设置了，以减少split操作的概率（空间换时间）

四. MySQL索引

4.1. MySQL 创建索引

ALTER TABLE 方式

CREATE INDEX 方式

mysql gcdb@localhost:mytest> create table t_index01(id int,age int,name varchar(10));

Query OK, 0 rows affected

Time: 0.137s

mysql gcdb@localhost:mytest> select * from t_index01;

+----+-----+----------+

| id | age | name     |

+----+-----+----------+

| 1  | 10  | xuliuyan |

| 2  | 20  | haozong  |

| 3  | 30  | wudi     |

+----+-----+----------+

3 rows in set

Time: 0.012s

mysql gcdb@localhost:mytest> explain select * from t_index01 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL  -- 看执行计划，使用的是扫描整张表的方式

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 3

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest> alter table t_index01 add index t_index01_inx_id(id);  -- 给字段id添加普通索引。

Query OK, 0 rows affected

Time: 0.074s

mysql gcdb@localhost:mytest> explain select * from t_index01 where id =3 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ref         -- 看执行计划，使用索引t_index01_inx_id

possible_keys | t_index01_inx_id

key           | t_index01_inx_id

key_len       | 5

ref           | const

rows          | 1

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.012s

-- 使用create index

mysql gcdb@localhost:mytest> explain select * from t_index01 where age =20\G; -- 同样age字段也没有索引

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 3

filtered      | 33.33

Extra         | Using where

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest> create index t_index01_age on t_index01(age);  -- 给age字段增加索引

Query OK, 0 rows affected

Time: 0.028s

mysql gcdb@localhost:mytest> explain select * from t_index01 where age =20\G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ref

possible_keys | t_index01_age

key           | t_index01_age        -- 查看执行计划，使用的key为t_index01_age，走了索引

key_len       | 5

ref           | const

rows          | 1

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest>

4.2. MySQL 查看索引

--

-- 方式一

--

mysql gcdb@localhost:mytest> use dbt3;

You are now connected to database "dbt3" as user "gcdb"

Time: 0.002s

mysql gcdb@localhost:dbt3> desc orders;

+-----------------+-------------+------+-----+---------+-------+

| Field           | Type        | Null | Key | Default | Extra |

+-----------------+-------------+------+-----+---------+-------+

| o_orderkey      | int(11)     | NO   | PRI | NULL    |       | -- 索引

| o_custkey       | int(11)     | YES  | MUL | NULL    |       | -- 索引

| o_orderstatus   | char(1)     | YES  |     | NULL    |       |

| o_totalprice    | double      | YES  |     | NULL    |       |

| o_orderDATE     | date        | YES  | MUL | NULL    |       | -- 索引

| o_orderpriority | char(15)    | YES  |     | NULL    |       |

| o_clerk         | char(15)    | YES  |     | NULL    |       |

| o_shippriority  | int(11)     | YES  |     | NULL    |       |

| o_comment       | varchar(79) | YES  |     | NULL    |       |

+-----------------+-------------+------+-----+---------+-------+

9 rows in set (0.00 sec)

--

-- 方式二

--

mysql gcdb@localhost:dbt3> show create table orders \G;

***************************[ 1. row ]***************************

Table        | orders

Create Table | CREATE TABLE `orders` (

  `o_orderkey` int(11) NOT NULL,

  `o_custkey` int(11) DEFAULT NULL,

  `o_orderstatus` char(1) DEFAULT NULL,

  `o_totalprice` double DEFAULT NULL,

  `o_orderDATE` date DEFAULT NULL,

  `o_orderpriority` char(15) DEFAULT NULL,

  `o_clerk` char(15) DEFAULT NULL,

  `o_shippriority` int(11) DEFAULT NULL,

  `o_comment` varchar(79) DEFAULT NULL,

  PRIMARY KEY (`o_orderkey`),  -- 索引

  KEY `i_o_orderdate` (`o_orderDATE`), -- 索引

  KEY `i_o_custkey` (`o_custkey`)  -- 索引

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set

Time: 0.038s

mysql gcdb@localhost:dbt3>

--

-- 方式三

--

mysql gcdb@localhost:dbt3> show index from orders\G;

***************************[ 1. row ]***************************

Table         | orders

Non_unique    | 0                -- 表示唯一

Key_name      | PRIMARY          -- key的name是primary

Seq_in_index  | 1

Column_name   | o_orderkey

Collation     | A

Cardinality   | 1369767         -- 基数，这个列上不同值的记录数

Sub_part      | <null>

Packed        | <null>

Null          |

Index_type    | BTREE           -- 索引类型是BTree

Comment       |

Index_comment |

***************************[ 2. row ]***************************

Table         | orders

Non_unique    | 1           -- Non_unique为True，表示不唯一

Key_name      | i_o_orderdate

Seq_in_index  | 1

Column_name   | o_orderDATE

Collation     | A

Cardinality   | 2566

Sub_part      | <null>

Packed        | <null>

Null          | YES

Index_type    | BTREE

Comment       |

Index_comment |

***************************[ 3. row ]***************************

Table         | orders

Non_unique    | 1

Key_name      | i_o_custkey

Seq_in_index  | 1

Column_name   | o_custkey

Collation     | A

Cardinality   | 100450

Sub_part      | <null>

Packed        | <null>

Null          | YES

Index_type    | BTREE

Comment       |

Index_comment |

(END)

mysql gcdb@localhost:dbt3>  select count(*) from orders;

+----------+

| count(*) |

+----------+

| 1500000  |  -- orders中有1500000条记录，和Cardinality 是不一致的

+----------+

1 row in set

Time: 3.682s

mysql gcdb@localhost:dbt3>

4.3. Cardinality（基数）

Cardinality表示该索引列上有多少不同的记录，这个是一个预估的值，是采样得到的（由InnoDB触发，采样20个页，进行预估），该值越大越好，即当Cardinality / RowNumber 越接近1越好,表示该列是高选择性的。

高选择性：身份证、手机号码、姓名、订单号等
低选择性：性别、年龄等

即该列是否适合创建索引，就看该字段是否具有高选择性

mysql gcdb@localhost:dbt3>   show create table lineitem\G

***************************[ 1. row ]***************************

Table        | lineitem

Create Table | CREATE TABLE `lineitem` (

  `l_orderkey` int(11) NOT NULL,

  `l_partkey` int(11) DEFAULT NULL,

  `l_suppkey` int(11) DEFAULT NULL,

  `l_linenumber` int(11) NOT NULL,

  `l_quantity` double DEFAULT NULL,

  `l_extendedprice` double DEFAULT NULL,

  `l_discount` double DEFAULT NULL,

  `l_tax` double DEFAULT NULL,

  `l_returnflag` char(1) DEFAULT NULL,

  `l_linestatus` char(1) DEFAULT NULL,

  `l_shipDATE` date DEFAULT NULL,

  `l_commitDATE` date DEFAULT NULL,

  `l_receiptDATE` date DEFAULT NULL,

  `l_shipinstruct` char(25) DEFAULT NULL,

  `l_shipmode` char(10) DEFAULT NULL,

  `l_comment` varchar(44) DEFAULT NULL,

  PRIMARY KEY (`l_orderkey`,`l_linenumber`),  --两个列复合主键

  KEY `i_l_shipdate` (`l_shipDATE`),

  KEY `i_l_suppkey_partkey` (`l_partkey`,`l_suppkey`),

  KEY `i_l_partkey` (`l_partkey`),

  KEY `i_l_suppkey` (`l_suppkey`),

  KEY `i_l_receiptdate` (`l_receiptDATE`),

  KEY `i_l_orderkey` (`l_orderkey`),

  KEY `i_l_orderkey_quantity` (`l_orderkey`,`l_quantity`),

  KEY `i_l_commitdate` (`l_commitDATE`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set

Time: 0.010s

mysql gcdb@localhost:dbt3> show index from lineitem\G

***************************[ 1. row ]***************************

Table         | lineitem

Non_unique    | 0

Key_name      | PRIMARY

Seq_in_index  | 1      -- 主键索引中的顺序，该列的顺序为1

Column_name   | l_orderkey

Collation     | A

Cardinality   | 1352854

Sub_part      | <null>

Packed        | <null>

Null          |

Index_type    | BTREE

Comment       |

Index_comment |

***************************[ 2. row ]***************************

Table         | lineitem

Non_unique    | 0

Key_name      | PRIMARY

Seq_in_index  | 2     -- 主键索引中的顺序，该列的顺序为2

Column_name   | l_linenumber

Collation     | A

Cardinality   | 5629794

Sub_part      | <null>

Packed        | <null>

Null          |

Index_type    | BTREE

Comment       |

Index_comment |

***************************[ 3. row ]***************************

Table         | lineitem

Non_unique    | 1

Key_name      | i_l_shipdate

Seq_in_index  | 1

Column_name   | l_shipDATE

Collation     | A

Cardinality   | 2495

Sub_part      | <null>

Packed        | <null>

Null          | YES

Index_type    | BTREE

Comment       |

对应当前例子

第一个索引（Seq_in_index = 1）的Cardinality的值表示当前列（l_orderkey）的不重复的值，

第二个索引（Seq_in_index = 2）的Cardinality的值表示前两列（l_orderkey）和（l_linenumber）不重复的值

--

-- SQL-1

--

mysql gcdb@localhost:dbt3> select l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipDATE,l_commitDATE,l_receiptDATE  fr

                        -> om lineitem limit 10;

+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+

| l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipDATE | l_commitDATE | l_receiptDATE |

+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+

| 1          | 155190    | 7706      | 1            | 17.0       | 21168.23        | 0.04       | 0.02  | N            | O            | 1996-03-13 | 1996-02-12   | 1996-03-22    |

| 1          | 67310     | 7311      | 2            | 36.0       | 45983.16        | 0.09       | 0.06  | N            | O            | 1996-04-12 | 1996-02-28   | 1996-04-20    |

| 1          | 63700     | 3701      | 3            |  8.0       | 13309.6         | 0.1        | 0.02  | N            | O            | 1996-01-29 | 1996-03-05   | 1996-01-31    |

| 1          | 2132      | 4633      | 4            | 28.0       | 28955.64        | 0.09       | 0.06  | N            | O            | 1996-04-21 | 1996-03-30   | 1996-05-16    |

| 1          | 24027     | 1534      | 5            | 24.0       | 22824.48        | 0.1        | 0.04  | N            | O            | 1996-03-30 | 1996-03-14   | 1996-04-01    |

| 1          | 15635     | 638       | 6            | 32.0       | 49620.16        | 0.07       | 0.02  | N            | O            | 1996-01-30 | 1996-02-07   | 1996-02-03    |

| 2          | 106170    | 1191      | 1            | 38.0       | 44694.46        | 0.0        | 0.05  | N            | O            | 1997-01-28 | 1997-01-14   | 1997-02-02    |

| 3          | 4297      | 1798      | 1            | 45.0       | 54058.05        | 0.06       | 0.0   | R            | F            | 1994-02-02 | 1994-01-04   | 1994-02-23    |

| 3          | 19036     | 6540      | 2            | 49.0       | 46796.47        | 0.1        | 0.0   | R            | F            | 1993-11-09 | 1993-12-20   | 1993-11-24    |

| 3          | 128449    | 3474      | 3            | 27.0       | 39890.88        | 0.06       | 0.07  | A            | F            | 1994-01-16 | 1993-11-22   | 1994-01-23    |

+------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+

10 rows in set

Time: 0.016s

--

-- SQL-2

--

mysql gcdb@localhost:dbt3> select l_orderkey, l_linenumber from lineitem limit 10;

+------------+--------------+

| l_orderkey | l_linenumber |

+------------+--------------+

| 721220     | 2            |

| 842980     | 4            |

| 904677     | 1            |

| 990147     | 1            |

| 1054181    | 1            |

| 1111877    | 3            |

| 1332613    | 1            |

| 1552449    | 2            |

| 2167527    | 3            |

| 2184032    | 5            |

+------------+--------------+

10 rows in set

Time: 0.011s

--- SQL-1和SQL-2其实都是在没有排序的情况下，取出前10条数据。但是结果不一样

--

-- SQL-3

--

mysql gcdb@localhost:dbt3> select l_orderkey, l_linenumber from lineitem order by l_orderkey limit 10;

 -- 和上面的sql相比，多了一个order by的操作

+------------+--------------+

| l_orderkey | l_linenumber |

+------------+--------------+

|          1 |            1 | -----

|          1 |            2 | -- 看orderkey为1，对应的linenumber有6条

|          1 |            3 | -- 这就是orderkey的Cardinality仅为140W

|          1 |            4 | -- 而(orderkey + linenumber)就有580W

|          1 |            5 |

|          1 |            6 | -----

|          2 |            1 |

|          3 |            1 |

|          3 |            2 |

|          3 |            3 |

+------------+--------------+

10 rows in set (0.01 sec)

--- SQL-3 和SQL-2 不同的原因是 他们走了不同的索引

mysql gcdb@localhost:dbt3> explain  select l_orderkey, l_linenumber from lineitem limit 10 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | lineitem

partitions    | <null>

type          | index

possible_keys | <null>

key           | i_l_shipdate -- 使用了shipdate进行了索引

key_len       | 4

ref           | <null>

rows          | 5630402

filtered      | 100.0

Extra         | Using index

1 row in set

Time: 0.010s

mysql gcdb@localhost:dbt3> explain select l_orderkey, l_linenumber from lineitem order by l_orderkey limit 10\G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | lineitem

partitions    | <null>

type          | index

possible_keys | <null>

key           | i_l_orderkey    -- 使用了orderkey进行了查询

key_len       | 4

ref           | <null>

rows          | 10

filtered      | 100.0

Extra         | Using index

1 row in set

Time: 0.012s

mysql gcdb@localhost:dbt3> explain select * from lineitem limit 10\G

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | lineitem

partitions    | <null>

type          | ALL         -- 显示进行了全表扫描

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 5630402

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.012s

mysql gcdb@localhost:dbt3>

mysql gcdb@localhost:dbt3> analyze table lineitem;

+---------------+---------+----------+----------+

| Table         | Op      | Msg_type | Msg_text |

+---------------+---------+----------+----------+

| dbt3.lineitem | analyze | status   | OK       |

+---------------+---------+----------+----------+

1 row in set

Time: 2.252s

mysql gcdb@localhost:dbt3>

innodb_on_state = off

在MySQL5.5之前，执行show create table操作会触发采样，而5.5之后将该参数off后，需要主动执行analyze table才会去采样。采样不会锁表或者锁记录。

4.4 复合索引

mysql gcdb@localhost:dbt3>   show create table lineitem\G

***************************[ 1. row ]***************************

Table        | lineitem

Create Table | CREATE TABLE `lineitem` (

  `l_orderkey` int(11) NOT NULL,

  `l_partkey` int(11) DEFAULT NULL,

  `l_suppkey` int(11) DEFAULT NULL,

  `l_linenumber` int(11) NOT NULL,

  `l_quantity` double DEFAULT NULL,

  `l_extendedprice` double DEFAULT NULL,

  `l_discount` double DEFAULT NULL,

  `l_tax` double DEFAULT NULL,

  `l_returnflag` char(1) DEFAULT NULL,

  `l_linestatus` char(1) DEFAULT NULL,

  `l_shipDATE` date DEFAULT NULL,

  `l_commitDATE` date DEFAULT NULL,

  `l_receiptDATE` date DEFAULT NULL,

  `l_shipinstruct` char(25) DEFAULT NULL,

  `l_shipmode` char(10) DEFAULT NULL,

  `l_comment` varchar(44) DEFAULT NULL,

  PRIMARY KEY (`l_orderkey`,`l_linenumber`),  --两个列复合主键

  KEY `i_l_shipdate` (`l_shipDATE`),

  KEY `i_l_suppkey_partkey` (`l_partkey`,`l_suppkey`),

  KEY `i_l_partkey` (`l_partkey`),

  KEY `i_l_suppkey` (`l_suppkey`),

  KEY `i_l_receiptdate` (`l_receiptDATE`),

  KEY `i_l_orderkey` (`l_orderkey`),

  KEY `i_l_orderkey_quantity` (`l_orderkey`,`l_quantity`),

  KEY `i_l_commitdate` (`l_commitDATE`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set

Time: 0.010s

--

-- 复合索引举例

--

mysql gcdb@localhost:dbt3> use mytest;

You are now connected to database "mytest" as user "gcdb"

Time: 0.002s

mysql gcdb@localhost:mytest> create index  idx_index01_id_age  on t_index01(id,age); --创建复合索引

Query OK, 0 rows affected

Time: 0.009s

mysql gcdb@localhost:mytest> insert t_index01  values(1,11,'dangdangxu'),(2,12,'quannenghao'),(3,13,'paogeb

                          -> an'),(4,14,'xiaolaopo');

Query OK, 4 rows affected

Time: 0.003s

mysql gcdb@localhost:mytest> select * from  t_index01;

+----+-----+------------+

| id | age | name       |

+----+-----+------------+

| 1  | 10  | xuliuyan   |

| 2  | 20  | haozong    |

| 3  | 30  | wudi       |

| 1  | 11  | dangdangxu |

| 2  | 12  | quannengha |

| 3  | 13  | paogeban   |

| 4  | 14  | xiaolaopo  |

+----+-----+------------+

7 rows in set

Time: 0.011s

mysql gcdb@localhost:mytest> drop index t_index01_inx_id on t_index01; --删除原先id列上索引t_index01_inx_id

You're about to run a destructive command.

Do you want to proceed? (y/n): y

Your call!

Query OK, 0 rows affected

Time: 0.055s

mysql gcdb@localhost:mytest> drop index t_index01_age on t_index01;   --删除原先age列上索引t_index01_inx_age

You're about to run a destructive command.

Do you want to proceed? (y/n): y

Your call!

Query OK, 0 rows affected

Time: 0.005s

mysql gcdb@localhost:mytest> select * from  t_index01 where id =2;

+----+-----+------------+

| id | age | name       |

+----+-----+------------+

| 2  | 12  | quannengha |

| 2  | 20  | haozong    |

+----+-----+------------+

2 rows in set

Time: 0.011s

mysql gcdb@localhost:mytest> explain select * from  t_index01 where id =2 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ref

possible_keys | idx_index01_id_age

key           | idx_index01_id_age  --走复合索引

key_len       | 5

ref           | const

rows          | 2

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.011s

mysql gcdb@localhost:mytest> explain select * from  t_index01 where id =2 and age=12 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ref

possible_keys | idx_index01_id_age

key           | idx_index01_id_age   --走复合索引

key_len       | 10

ref           | const,const

rows          | 1

filtered      | 100.0

Extra         | <null>

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest> explain select * from  t_index01 where  age=12 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL       --不走索引，全表扫描

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 7

filtered      | 14.29

Extra         | Using where

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest>

mysql gcdb@localhost:mytest> explain select * from  t_index01 where id =2 or age=12 \G; --求并集

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL    -- 没有使用索引，因为age列没有索引，所以age列是走全表扫描的

possible_keys | idx_index01_id_age

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 7

filtered      | 35.71

Extra         | Using where

1 row in set

Time: 0.013s

--

-- 覆盖索引

--

---- 还是只使用age列去做范围查询，发现是走索引了

mysql gcdb@localhost:mytest> explain select id,age from  t_index01 where age>12 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | index

possible_keys | <null>

key           | idx_index01_id_age

key_len       | 10

ref           | <null>

rows          | 7

filtered      | 33.33

Extra         | Using where; Using index    -- 覆盖索引

1 row in set

Time: 0.011s

mysql gcdb@localhost:mytest>

-- 因为要求select id列和age列，idx_index01_id_age索引包含了id的全部的记录的，即同过idx_index01_id_age索引也是可以得到id,age列的结果集

mysql gcdb@localhost:mytest> explain select id,age,name from  t_index01 where age>12 \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL    -- 查询 （id,age,name） 就没法使用（id，age）索引了，需要全表扫描age的值。

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 7

filtered      | 33.33

Extra         | Using where

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest>

mysql gcdb@localhost:mytest> explain select id,age from  t_index01 where id=1 and name='xuliuyan' \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ref

possible_keys | idx_index01_id_age

key           | idx_index01_id_age     -- 先用走索引通过id得到结果集，再用去name='xuliuyan'过滤

key_len       | 5

ref           | const

rows          | 2

filtered      | 14.29

Extra         | Using where

1 row in set

Time: 0.013s

mysql gcdb@localhost:mytest>

mysql gcdb@localhost:mytest> explain select id,age from  t_index01 where age =10 and name='xuliuyan' \G;

***************************[ 1. row ]***************************

id            | 1

select_type   | SIMPLE

table         | t_index01

partitions    | <null>

type          | ALL       --age和name列顺序age列开始，idx_index01_id_age索引扫描id列开始

possible_keys | <null>

key           | <null>

key_len       | <null>

ref           | <null>

rows          | 7

filtered      | 14.29

Extra         | Using where

1 row in set

Time: 0.012s

mysql gcdb@localhost:mytest>

五. information_schema

information_schema 数据库相当于一个数据字典。保存了表的元信息。

mysql gcdb@localhost:mytest> use information_schema

You are now connected to database "information_schema" as user "gcdb"

Time: 0.002s

mysql gcdb@localhost:information_schema> show tables

+---------------------------------------+

| Tables_in_information_schema          |

+---------------------------------------+

| CHARACTER_SETS                        |

| COLLATIONS                            |

| COLLATION_CHARACTER_SET_APPLICABILITY |

| COLUMNS                               |

| COLUMN_PRIVILEGES                     |

| ENGINES                               |

| EVENTS                                |

| FILES                                 |

| GLOBAL_STATUS                         |

| GLOBAL_VARIABLES                      |

| KEY_COLUMN_USAGE                      |

| OPTIMIZER_TRACE                       |

| PARAMETERS                            |

| PARTITIONS                            |

| PLUGINS                               |

| PROCESSLIST                           |

| PROFILING                             |

| REFERENTIAL_CONSTRAINTS               |

| ROUTINES                              |

| SCHEMATA                              |

| SCHEMA_PRIVILEGES                     |

| SESSION_STATUS                        |

| SESSION_VARIABLES                     |

| STATISTICS                            |

| TABLES                                |

| TABLESPACES                           |

| TABLE_CONSTRAINTS                     |

| TABLE_PRIVILEGES                      |

| TRIGGERS                              |

| USER_PRIVILEGES                       |

| VIEWS                                 |

| INNODB_LOCKS                          |

| INNODB_TRX                            |

| INNODB_SYS_DATAFILES                  |

| INNODB_FT_CONFIG                      |

| INNODB_SYS_VIRTUAL                    |

| INNODB_CMP                            |

| INNODB_FT_BEING_DELETED               |

| INNODB_CMP_RESET                      |

| INNODB_CMP_PER_INDEX                  |

| INNODB_CMPMEM_RESET                   |

| INNODB_FT_DELETED                     |

| INNODB_BUFFER_PAGE_LRU                |

| INNODB_LOCK_WAITS                     |

| INNODB_TEMP_TABLE_INFO                |

| INNODB_SYS_INDEXES                    |

| INNODB_SYS_TABLES                     |

| INNODB_SYS_FIELDS                     |

| INNODB_CMP_PER_INDEX_RESET            |

| INNODB_BUFFER_PAGE                    |

| INNODB_FT_DEFAULT_STOPWORD            |

| INNODB_FT_INDEX_TABLE                 |

| INNODB_FT_INDEX_CACHE                 |

| INNODB_SYS_TABLESPACES                |

| INNODB_METRICS                        |

| INNODB_SYS_FOREIGN_COLS               |

| INNODB_CMPMEM                         |

| INNODB_BUFFER_POOL_STATS              |

| INNODB_SYS_COLUMNS                    |

| INNODB_SYS_FOREIGN                    |

| INNODB_SYS_TABLESTATS                 |

+---------------------------------------+

61 rows in set

Time: 0.011s

--例如：查看employees表的索引信息

mysql gcdb@localhost:information_schema>  select CONSTRAINT_SCHEMA,CONSTRAINT_NAME,TABLE_SCHEMA,COLUMN_NAME,ORDINAL_POSITION from key_column_usage where table_schema='employees' limit 10;  -- 显示索引的信息

+-------------------+---------------------+--------------+-------------+------------------+

| CONSTRAINT_SCHEMA | CONSTRAINT_NAME     | TABLE_SCHEMA | COLUMN_NAME | ORDINAL_POSITION |

+-------------------+---------------------+--------------+-------------+------------------+

| employees         | PRIMARY             | employees    | dept_no     | 1                |

| employees         | dept_name           | employees    | dept_name   | 1                |

| employees         | PRIMARY             | employees    | emp_no      | 1                |

| employees         | PRIMARY             | employees    | dept_no     | 2                |

| employees         | dept_emp_ibfk_1     | employees    | emp_no      | 1                |

| employees         | dept_emp_ibfk_2     | employees    | dept_no     | 1                |

| employees         | PRIMARY             | employees    | emp_no      | 1                |

| employees         | PRIMARY             | employees    | dept_no     | 2                |

| employees         | dept_manager_ibfk_1 | employees    | emp_no      | 1                |

| employees         | dept_manager_ibfk_2 | employees    | dept_no     | 1                |

+-------------------+---------------------+--------------+-------------+------------------+

10 rows in set

Time: 0.013s

mysql gcdb@localhost:information_schema>

5.1 问题

5.1.1 哪张数据表中记录了Cardinality信息？

mysql gcdb@localhost:information_schema> select * from `STATISTICS` limit 2 \G;

***************************[ 1. row ]***************************

TABLE_CATALOG | def

TABLE_SCHEMA  | dbt3

TABLE_NAME    | customer

NON_UNIQUE    | 0

INDEX_SCHEMA  | dbt3

INDEX_NAME    | PRIMARY

SEQ_IN_INDEX  | 1

COLUMN_NAME   | c_custkey

COLLATION     | A

CARDINALITY   | 148473

SUB_PART      | <null>

PACKED        | <null>

NULLABLE      |

INDEX_TYPE    | BTREE

COMMENT       |

INDEX_COMMENT |

***************************[ 2. row ]***************************

TABLE_CATALOG | def

TABLE_SCHEMA  | dbt3

TABLE_NAME    | customer

NON_UNIQUE    | 1

INDEX_SCHEMA  | dbt3

INDEX_NAME    | i_c_nationkey

SEQ_IN_INDEX  | 1

COLUMN_NAME   | c_nationkey

COLLATION     | A

CARDINALITY   | 24

SUB_PART      | <null>

PACKED        | <null>

NULLABLE      | YES

INDEX_TYPE    | BTREE

COMMENT       |

INDEX_COMMENT |

2 rows in set

Time: 0.032s

5.1.2 判断线上的索引是否有创建的必要？

MySQL 5.7 STATISTICS表记录了各个索引的CARDINALITY值

SQL如下：

SELECT

    t.TABLE_SCHEMA,

    t.TABLE_NAME,

    INDEX_NAME,

    CARDINALITY,

    TABLE_ROWS,

    CARDINALITY / TABLE_ROWS AS SELECTIVITY

FROM

    TABLES t,

    (SELECT

        table_schema, table_name, index_name, cardinality

    FROM

        STATISTICS

    WHERE

        (table_schema , table_name, index_name, seq_in_index) IN (SELECT

                table_schema, table_name, index_name, MAX(seq_in_index)

            FROM

                STATISTICS

            GROUP BY table_schema , table_name , index_name)) s

WHERE

    t.table_schema = s.table_schema

        AND t.table_name = s.table_name

        AND t.table_schema = 'employees'

ORDER BY SELECTIVITY;

SQL解释：

从information_schema.TABLES表中获取table_schema, table_name, table_rows等信息。

mysql gcdb@localhost:information_schema>  show create table TABLES\G

***************************[ 1. row ]***************************

Table        | TABLES

Create Table | CREATE TEMPORARY TABLE `TABLES` (

  `TABLE_CATALOG` varchar(512) NOT NULL DEFAULT '',

  `TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',   -- 表所在的库

  `TABLE_NAME` varchar(64) NOT NULL DEFAULT '',     -- 表名

  `TABLE_TYPE` varchar(64) NOT NULL DEFAULT '',

  `ENGINE` varchar(64) DEFAULT NULL,

  `VERSION` bigint(21) unsigned DEFAULT NULL,

  `ROW_FORMAT` varchar(10) DEFAULT NULL,

  `TABLE_ROWS` bigint(21) unsigned DEFAULT NULL,    -- 表的记录数

  `AVG_ROW_LENGTH` bigint(21) unsigned DEFAULT NULL,

  `DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,

  `MAX_DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,

  `INDEX_LENGTH` bigint(21) unsigned DEFAULT NULL,

  `DATA_FREE` bigint(21) unsigned DEFAULT NULL,

  `AUTO_INCREMENT` bigint(21) unsigned DEFAULT NULL,

  `CREATE_TIME` datetime DEFAULT NULL,

  `UPDATE_TIME` datetime DEFAULT NULL,

  `CHECK_TIME` datetime DEFAULT NULL,

  `TABLE_COLLATION` varchar(32) DEFAULT NULL,

  `CHECKSUM` bigint(21) unsigned DEFAULT NULL,

  `CREATE_OPTIONS` varchar(255) DEFAULT NULL,

  `TABLE_COMMENT` varchar(2048) NOT NULL DEFAULT ''

) ENGINE=MEMORY DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

2.从information.STATISTICS获取table_schema和 table_name 信息

mysql gcdb@localhost:information_schema> show create table STATISTICS \G;

***************************[ 1. row ]***************************

Table        | STATISTICS

Create Table | CREATE TEMPORARY TABLE `STATISTICS` (

  `TABLE_CATALOG` varchar(512) NOT NULL DEFAULT '',

  `TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',  -- 表所在的库

  `TABLE_NAME` varchar(64) NOT NULL DEFAULT '',  -- 表名

  `NON_UNIQUE` bigint(1) NOT NULL DEFAULT '0',

  `INDEX_SCHEMA` varchar(64) NOT NULL DEFAULT '',

  `INDEX_NAME` varchar(64) NOT NULL DEFAULT '',   -- 索引名

  `SEQ_IN_INDEX` bigint(2) NOT NULL DEFAULT '0',

  `COLUMN_NAME` varchar(64) NOT NULL DEFAULT '',

  `COLLATION` varchar(1) DEFAULT NULL,

  `CARDINALITY` bigint(21) DEFAULT NULL,

  `SUB_PART` bigint(3) DEFAULT NULL,

  `PACKED` varchar(10) DEFAULT NULL,

  `NULLABLE` varchar(3) NOT NULL DEFAULT '',

  `INDEX_TYPE` varchar(16) NOT NULL DEFAULT '',

  `COMMENT` varchar(16) DEFAULT NULL,

  `INDEX_COMMENT` varchar(1024) NOT NULL DEFAULT ''

) ENGINE=MEMORY DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

将TABLES 和 STATISTICS 表中的table_schema和table_name相关联并通过Cardinality和table_rows 计算，即可得到对应索引名

3.1 因为存在复合索引，所以我们要取出复合索引中seq最大的哪个值,取出的cardinality值才是最大的

mysql gcdb@localhost:information_schema> SELECT

                                      ->     table_schema, table_name, index_name, MAX(seq_in_index)

                                      ->     FROM

                                      ->         information_schema.STATISTICS

                                      ->         GROUP BY table_schema , table_name , index_name limit 10;

+--------------+------------+-----------------------+-------------------+

| table_schema | table_name | index_name            | MAX(seq_in_index) |

+--------------+------------+-----------------------+-------------------+

| dbt3         | customer   | i_c_nationkey         | 1                 |

| dbt3         | customer   | PRIMARY               | 1                 |

| dbt3         | lineitem   | i_l_commitdate        | 1                 |

| dbt3         | lineitem   | i_l_orderkey          | 1                 |

| dbt3         | lineitem   | i_l_orderkey_quantity | 2                 |

| dbt3         | lineitem   | i_l_partkey           | 1                 |

| dbt3         | lineitem   | i_l_receiptdate       | 1                 |

| dbt3         | lineitem   | i_l_shipdate          | 1                 |

| dbt3         | lineitem   | i_l_suppkey           | 1                 |

| dbt3         | lineitem   | i_l_suppkey_partkey   | 2                 |

+--------------+------------+-----------------------+-------------------+

10 rows in set

Time: 0.023s

mysql gcdb@localhost:information_schema>

3.2 得到了最大的seq值，从而可以取出对应的cardinality值

mysql gcdb@localhost:information_schema>SELECT

    -> table_schema,

    -> table_name,

    -> index_name,

    -> cardinality

    -> FROM

    -> information_schema.STATISTICS

    -> WHERE

    -> ( table_schema, table_name, index_name, seq_in_index ) IN ( SELECT table_schema, table_name, index_name, MAX( seq_in_index ) FROM information_schema.STATISTICS GROUP BY table_schema, table_name, index_name )

    -> LIMIT 10;

+--------------+------------+-----------------------+-------------+

| table_schema | table_name | index_name            | cardinality |

+--------------+------------+-----------------------+-------------+

| dbt3         | customer   | PRIMARY               | 148473      |

| dbt3         | customer   | i_c_nationkey         | 24          |

| dbt3         | lineitem   | PRIMARY               | 5945476     |

| dbt3         | lineitem   | i_l_shipdate          | 2496        |

| dbt3         | lineitem   | i_l_suppkey_partkey   | 804391      |

| dbt3         | lineitem   | i_l_partkey           | 204210      |

| dbt3         | lineitem   | i_l_suppkey           | 12014       |

| dbt3         | lineitem   | i_l_receiptdate       | 2501        |

| dbt3         | lineitem   | i_l_orderkey          | 1503736     |

| dbt3         | lineitem   | i_l_orderkey_quantity | 5293710     |

+--------------+------------+-----------------------+-------------+

10 rows in set

Time: 0.130s

mysql gcdb@localhost:information_schema>

3.3 最后通过table_schema和table_name 让上述的信息和TABLES表进行关联,获取employees库的所有使用情况

mysql gcdb@localhost:information_schema> SELECT

    -> t.TABLE_SCHEMA,

    -> t.TABLE_NAME,

    -> INDEX_NAME,

    -> CARDINALITY,

    -> TABLE_ROWS,

    -> CARDINALITY / TABLE_ROWS AS SELECTIVITY

    -> FROM

    -> TABLES t,

    -> (

    -> SELECT

    -> table_schema,

    -> table_name,

    -> index_name,

    -> cardinality

    -> FROM

    -> STATISTICS

    -> WHERE

    -> ( table_schema, table_name, index_name, seq_in_index ) IN ( SELECT table_schema, table_name, index_name, MAX( seq_in_index ) FROM STATISTICS GROUP BY table_schema, table_name, index_name )

    -> ) s

    -> WHERE

    -> t.table_schema = s.TABLE_schema

    -> AND t.table_name = s.table_name

    -> AND t.table_schema = 'employees'

    -> ORDER BY

    -> SELECTIVITY;

+--------------+--------------+------------+-------------+------------+-------------+

| TABLE_SCHEMA | TABLE_NAME   | index_name | cardinality | TABLE_ROWS | SELECTIVITY |

+--------------+--------------+------------+-------------+------------+-------------+

| employees    | dept_emp     | dept_no    | 8           | 331143     | 0.0000      |

| employees    | dept_manager | dept_no    | 9           | 24         | 0.3750      |

| employees    | departments  | dept_name  | 9           | 9          | 1.0000      |

| employees    | titles       | PRIMARY    | 442843      | 442843     | 1.0000      |

| employees    | dept_manager | PRIMARY    | 24          | 24         | 1.0000      |

| employees    | newsal       | PRIMARY    | 2838045     | 2838045    | 1.0000      |

| employees    | dept_emp     | PRIMARY    | 331143      | 331143     | 1.0000      |

| employees    | departments  | PRIMARY    | 9           | 9          | 1.0000      |

| employees    | salaries     | PRIMARY    | 2838426     | 2838426    | 1.0000      |

| employees    | employees    | PRIMARY    | 299423      | 299423     | 1.0000      |

+--------------+--------------+------------+-------------+------------+-------------+

10 rows in set

Time: 1.589s

mysql gcdb@localhost:information_schema>

通过最后一列的SELECTIVITY是否接近1，判断该索引创建是否合理
注意：
- Cardinality和table_rows的值，都是通过随机采样，预估得到的
- 当analyze前后，Cardinality值相差较多，说明该索引是不应该被创建的(页上的记录数值分布不平)
- 推荐 SELECTIVITY 15% 以上是适合的

秒客网

015：索引、B+树

一. 索引

1. 索引的定义

2. 二分查找

二. Binary Tree(二叉树)

1. 二叉树的定义

2. 平衡二叉树（AVL-树）

需要通过旋转（左旋，右旋）来维护平衡二叉树的平衡，在添加和删除的时候需要有额外的开销。

三. B树/B+树

3.1. B树的定义

3.2. B+树的定义

3.3. B+树的作用

3.4. B+树的操作

3.5. B+树的扇出(fan out)

3.6. B+树存储数据举例

四. MySQL索引

4.1. MySQL 创建索引

4.2. MySQL 查看索引

4.3. Cardinality（基数）

4.4 复合索引

五. information_schema

5.1 问题

5.1.1 哪张数据表中记录了Cardinality信息？

5.1.2 判断线上的索引是否有创建的必要？

相关文章