虚构一个微型在线书店的数据库和数据,作为后续MySQL脚本的执行源,方便后续MySQL和SQL的练习。
在虚构这个库的过程中,主要涉及的是如何使用命令行管理 MySQL
数据库对象:数据库、表、索引、外键等;另一个更为重要的是如何Mock对应表的数据。
虚构书店数据库的dump脚本:Github
数据库(Database)
将要创建的虚拟书店的数据库名为: mysql_practice
;
创建数据库的语法:
CREATE DATABASE [IF NOT EXISTS] database_name
[CHARACTER SET charset_name]
[COLLATE collation_name]
- IF NOT EXISTS: 可选项,避免数据库已经存在时报错。
- CHARACTER SET:可选项,不指定的时候会默认给个。
- 查看当前MySQL Server支持的字符集(character set):
show character set; -- 方法1
show charset; -- 方法2
show char set; -- 方法3
- 查看当前MySQL Server支持的字符集(character set):
- COLLATE:针对特定
character set
比较字符串的规则集合;可选项,不指定的时候会默认给个。- 获取
charater set
的collations
show collation like 'utf8%';
- collation名字的规则:
charater_set_name_ci
或者charater_set_name_cs
或charater_set_name_bin
;_ci
表示不区分大小写,_cs
表示区分大小写;_bin
表示用编码值比较。
- 获取
- 示例:
CREATE DATABASE my_test_tb CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
TODO: 关于 character set和collations,内容稍微有点多,后面会单独记一篇文章。
登录的时候选择数据库
mysql -uroot -D database_name -p
登录后选择数据库
use database_name;
查看当前选的数据库
select database();
创建新数据库
create database if not exists mysql_practice;
通过下面的语句可以检查创建的数据库:
show create database mysql_practice;
可以看到,如果创建数据库时候没有指定 character set
和 collate
的话,会默认指定一套。
显示所有当前账户可见的数据库
show databases;
删除数据库
drop database if exists mysql_practice;
MySQL中 schema
是 database
的同义词,因此也可以使用下面语句删除数据库:
drop schema if exists mysql_practice;
数据表(Table)
MySQL创建数据表的语法
CREATE TABLE [IF NOT EXISTS] table_name(
column_1_definition,
column_2_definition,
...,
table_constraints
) ENGINE=storage_engine;
表列的定义语法:
column_name data_type(length) [NOT NULL] [DEFAULT value] [AUTO_INCREMENT] column_constraint;
表的约束(Table Constraints): UNIQUE
, CHECK
, PRIMARY KEY
and FOREIGN KEY
.
查看表的定义
desc table_name;
创建mysql_practice数据表
USE mysql_practice;
DROP TABLE IF EXISTS customer_order;
DROP TABLE IF EXISTS book;
DROP TABLE IF EXISTS book_category;
DROP TABLE IF EXISTS customer_address;
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS region;
-- region,数据使用: https://github.com/xiangyuecn/AreaCity-JsSpider-StatsGov
CREATE TABLE IF NOT EXISTS region(
id INT AUTO_INCREMENT,
pid INT NOT NULL,
deep INT NOT NULL,
name VARCHAR(200) NOT NULL,
pinyin_prefix VARCHAR(10) NOT NULL,
pinyin VARCHAR(200) NOT NULL,
ext_id VARCHAR(100) NOT NULL,
ext_name VARCHAR(200) NOT NULL,
PRIMARY KEY(id)
);
-- customer
CREATE TABLE IF NOT EXISTS customer(
id INT AUTO_INCREMENT,
no VARCHAR(50) NOT NULL,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
status VARCHAR(20) NOT NULL,
phone_number VARCHAR(20) NULL,
updated_at DATETIME NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY(id),
unique(no)
) ENGINE=INNODB;
-- customer address
CREATE TABLE IF NOT EXISTS customer_address(
id INT AUTO_INCREMENT,
customer_id INT NOT NULL,
area_id INT NULL,
address_detail VARCHAR(200) NULL,
is_default bit NOT NULL,
updated_at DATETIME NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY(customer_id) REFERENCES customer (id) ON UPDATE RESTRICT ON DELETE CASCADE
) ENGINE=INNODB;
-- book category
CREATE TABLE IF NOT EXISTS book_category(
id INT AUTO_INCREMENT,
code VARCHAR(200) NOT NULL,
name VARCHAR(200) NOT NULL,
parent_id INT NULL,
deep INT NULL,
updated_at DATETIME NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY(id)
);
-- book
CREATE TABLE IF NOT EXISTS book(
id INT AUTO_INCREMENT,
category_id INT NOT NULL,
no VARCHAR(50) NOT NULL,
name VARCHAR(200) NOT NULL,
status VARCHAR(50) NOT NULL,
unit_price DOUBLE NOT NULL,
author VARCHAR(50) NULL,
publish_date DATETIME NULL,
publisher VARCHAR(200) NOT NULL,
updated_at DATETIME NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY (category_id) REFERENCES book_category (id) ON UPDATE RESTRICT ON DELETE CASCADE
);
-- orders
CREATE TABLE IF NOT EXISTS customer_order(
id INT AUTO_INCREMENT,
no VARCHAR(50) NOT NULL,
customer_id INT NOT NULL,
book_id INT NOT NULL,
quantity INT NOT NULL,
total_price DOUBLE NOT NULL,
discount DOUBLE NULL,
order_date DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY (customer_id) REFERENCES customer(id) ON UPDATE RESTRICT ON DELETE CASCADE,
FOREIGN KEY (book_id) references book (id) on update restrict on delete cascade
) ENGINE=INNODB;
导入region数据
下载region csv数据:【三级】省市区 数据下载.
导入语句:
LOAD DATA INFILE '/tmp/ok_data_level3.csv'
INTO TABLE region
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
导入如果报错:
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
- 通过命令
mdfind -name my.cnf
找到mysql配置文件my.cnf
; -
解决办法 (还没实际测试过,大都使用的是
LOATA DATA LOCAL INFILE
方式)
或者使用 LOAD DATA LOCAL INFILE
代替 LOAD DATA INFILE
即:
LOAD DATA LOCAL INFILE '/tmp/ok_data_level3.csv'
INTO TABLE region
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
如果报错:
Error Code: 3948. Loading local data is disabled; this must be enabled on both the client and server sides
或者报错:
ERROR 1148 (42000): The used command is not allowed with this MySQL version
- 查看配置:
show variables like "local_infile";
- 修改配置:
set global local_infile = 1;
生成Customer数据
创建一个SP:
USE mysql_practice;
DROP PROCEDURE IF EXISTS sp_generate_customers;
DELIMITER $$
CREATE PROCEDURE sp_generate_customers()
BEGIN
-- Generate 10000 customer and customer_address
set @fNameIndex = 1;
set @lNameIndex = 1;
loop_label_f: LOOP
IF @fNameIndex > 100 THEN
LEAVE loop_label_f;
END IF;
set @fName = ELT(@fNameIndex, "James","Mary","John","Patricia","Robert","Linda","Michael","Barbara","William","Elizabeth","David","Jennifer","Richard","Maria","Charles","Susan","Joseph","Margaret","Thomas","Dorothy","Christopher","Lisa","Daniel","Nancy","Paul","Karen","Mark","Betty","Donald","Helen","George","Sandra","Kenneth","Donna","Steven","Carol","Edward","Ruth","Brian","Sharon","Ronald","Michelle","Anthony","Laura","Kevin","Sarah","Jason","Kimberly","Matthew","Deborah","Gary","Jessica","Timothy","Shirley","Jose","Cynthia","Larry","Angela","Jeffrey","Melissa","Frank","Brenda","Scott","Amy","Eric","Anna","Stephen","Rebecca","Andrew","Virginia","Raymond","Kathleen","Gregory","Pamela","Joshua","Martha","Jerry","Debra","Dennis","Amanda","Walter","Stephanie","Patrick","Carolyn","Peter","Christine","Harold","Marie","Douglas","Janet","Henry","Catherine","Carl","Frances","Arthur","Ann","Ryan","Joyce","Roger","Diane");
loop_label_last: LOOP
IF @lNameIndex > 100 THEN
LEAVE loop_label_last;
END IF;
SET @lName = ELT(@lNameIndex, "Smith","Johnson","Williams","Jones","Brown","Davis","Miller","Wilson","Moore","Taylor","Anderson","Thomas","Jackson","White","Harris","Martin","Thompson","Garcia","Martinez","Robinson","Clark","Rodriguez","Lewis","Lee","Walker","Hall","Allen","Young","Hernandez","King","Wright","Lopez","Hill","Scott","Green","Adams","Baker","Gonzalez","Nelson","Carter","Mitchell","Perez","Roberts","Turner","Phillips","Campbell","Parker","Evans","Edwards","Collins","Stewart","Sanchez","Morris","Rogers","Reed","Cook","Morgan","Bell","Murphy","Bailey","Rivera","Cooper","Richardson","Cox","Howard","Ward","Torres","Peterson","Gray","Ramirez","James","Watson","*s","Kelly","Sanders","Price","Bennett","Wood","Barnes","Ross","Henderson","Coleman","Jenkins","Perry","Powell","Long","Patterson","Hughes","Flores","Washington","Butler","Simmons","Foster","Gonzales","Bryant","Alexander","Russell","Griffin","Diaz","Hayes");
-- insert into customer
INSERT INTO customer(no, first_name, last_name, status, phone_number, updated_at, created_at)
values(
REPLACE(LEFT(uuid(), 16), '-', ''),
@fName,
@lName,
'ACTIVE',
null,
curdate(),
curdate()
);
-- insert into customer_address
set @randomArea = 0;
SELECT id into @randomArea FROM region where deep = 2 ORDER BY RAND() LIMIT 1;
INSERT INTO customer_address(customer_id, area_id, address_detail, is_default, updated_at, created_at)
VALUES(
@@Identity,
@randomArea,
'',
1,
curdate(),
curdate()
);
set @lNameIndex = @lNameIndex + 1;
END LOOP loop_label_last;
SET @lNameIndex = 1; -- Note: assign 1 to last name index, for next loop.
SET @fnameIndex = @fnameIndex + 1;
END LOOP loop_label_f;
-- update address_detail in customer_address
UPDATE customer_address ca
JOIN region r on ca.area_id = r.id and r.deep = 2
join region r2 on r.pid = r2.id and r2.deep = 1
join region r3 on r2.pid = r3.id and r3.deep = 0
SET ca.address_detail = concat(r3.ext_name, r2.ext_name, r.ext_name);
END $$
DELIMITER ;
调用SP:
call sp_generate_customers();
生成产品分类和产品数据
第零步: 手动插入产品分类到product_category
表中
INSERT INTO product_category(code,name, parent_id, deep, updated_at, created_at)
VALUES
('BOOK', 'Book', 0, 0, curdate(), curdate()),
('BOOK_CODE', 'Code Book', 1, 1, curdate(), curdate()),
('BOOK_CHIDREN', 'Children Book', 1, 1, curdate(), curdate()),
('BOOK_SCIENCE', 'Science Book', 1, 1, curdate(), curdate());
第一步: 用Python写个爬虫工具,抓取书店的商品信息。
下面是抓取当当搜索“科学”关键字的书籍列表。
import requests
import csv
from bs4 import BeautifulSoup
def crawl(url):
res = requests.get(url)
res.encoding = 'gb18030'
soup = BeautifulSoup(res.text, 'html.parser')
n = 0
section = soup.find('ul', id='component_59')
allLIs = section.find_all('li')
#print(allLIs)
with open('output_science.csv', 'w', encoding='utf8') as f:
csv_writer = csv.writer(f, delimiter='#') # 由于内容里有',',因此这里指定'#'作为csv分隔符
csv_writer.writerow(['序号', '书名', '价格', '作者', '出版时间', '出版社'])
for books in allLIs:
title = books.select('.name')[0].text.strip().split(' ', 1)[0].strip()
price = books.select('.search_pre_price')[0].text.strip('¥')
authorInfo = books.select('.search_book_author')[0].text.strip().split('/')
author = authorInfo[0]
publishDate = authorInfo[1]
publisher = authorInfo[2]
n += 1
csv_writer.writerow([n, title, price, author, publishDate, publisher])
url = 'http://search.dangdang.com/?key=%BF%C6%D1%A7&act=input'
crawl(url)
第二步: 导入csv数据到MySQL数据表mock_science中。
CREATE TABLE `mock_science` (
`id` int(11) NOT NULL,
`name` varchar(200) DEFAULT NULL,
`price` double DEFAULT NULL,
`author` varchar(100) DEFAULT NULL,
`publish_date` varchar(100) DEFAULT NULL,
`publisher` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
第三步: 插入科学类书信息到product表中
INSERT book(category_id, no, name, status,unit_price, author,publish_date,publisher, updated_at, created_at)
SELECT
4,
REPLACE(LEFT(uuid(), 16), '-', ''),
name,
'ACTIVE',
price,
author,
publish_date,
publisher,
curdate(),
curdate()
FROM
mock_science;
循环第一到第三步,可以插入更多的产品信息。练习数据库最终抓取了JAVA
,儿童
,科学
三个关键搜索出的第一页书籍。
生成订单数据
随机生成订单数据的SP(注意:这个sp生成的数据,还需要进一步处理):
USE mysql_practice;
DROP PROCEDURE IF EXISTS sp_generate_orders;
DELIMITER $$
-- Reference: https://www.mysqltutorial.org/select-random-records-database-table.aspx
-- Generate orders for last two years.
-- each day have orders range: [500, 5000]
CREATE PROCEDURE sp_generate_orders()
BEGIN
SET @startDate = '2020-03-01';
SET @endDate = curdate();
loop_label_p: LOOP
IF @startDate > @endDate THEN
LEAVE loop_label_p;
END IF;
SET @randCustomerTotal = FLOOR(RAND()*50) + 100;
SET @randBookTotal = FLOOR(RAND()*5) + 1;
SET @randQty = FLOOR(RAND()*3) + 1;
SET @query1 = CONCAT('INSERT INTO customer_order(no, customer_id, book_id, quantity, total_price,discount, order_date, updated_at, created_at)');
SET @query1 = CONCAT(@query1, ' select ', "'", uuid(), "'",', c.id, p.id,', @randQty, ', 0, 0, ', "'",@startDate,"'", ',', "'",curdate(),"'" ,',', "'",curdate(),"'");
SET @query1 = CONCAT(@query1, ' FROM (select id from customer ORDER BY RAND() LIMIT ', @randCustomerTotal,') c join ');
SET @query1 = CONCAT(@query1, ' (select id from book order by rand() limit ', @randBookTotal,') p ');
SET @query1 = CONCAT(@query1, 'where c.id is not null');
PREPARE increased FROM @query1;
EXECUTE increased;
SET @startDate = DATE_ADD(@startDate, INTERVAL 1 DAY);
END LOOP loop_label_p;
END $$
DELIMITER ;
总共会生成几十万或上百万条order数据;最好先简单加下index,不然query太慢,可以在创建db table后就加上。
添加index:
ALTER TABLE book ADD INDEX idx_unit_price(unit_price);
ALTER TABLE customer_order ADD INDEX idx_order_no(no);
ALTER TABLE customer_order ADD INDEX idx_order_date(order_date);
ALTER TABLE customer_order ADD INDEX idx_quantity(quantity);
更新order no:
-- update order total_price
-- please note it is better to add index first. otherwise it will be slow.
-- update order_no
update customer_order
set no = concat(REPLACE(LEFT(no, 16), '-', ''), customer_id, book_id)
where no is not null;
-- update total price
如果不想有重复的order no,可以通过下面的sql更新order no:
-- 处理重复的 order no
update customer_order co
join
(select no from customer_order co2 group by co2.no having count(*) > 1) as cdo
on co.no = cdo.no
set co.no = concat(REPLACE(LEFT(uuid(), 16), '-', ''), customer_id, book_id);
如果还有重复的order no,继续run上面这个sql,直到没有重复的即可。
更新order表里的total_price:
-- update total price
update customer_order co
join book b
on co.book_id = b.id
SET co.total_price = co.quantity * b.unit_price;
至此,我们的数据库表和对应的mock数据已经基本完成。使用mysqldump备份一下:
mysqldump -u [username] –p[password] [database_name] > [dump_file.sql]
下一步
- 视图(View)
- 存储过程(Store Procedure)
- 函数(Function)
- 触发器(Trigger)
- 定时任务(Job)
参考资料
- MySQL Character Set
- MySQL Collation
- Generating random names in MySQL
- MySQL LOOP
- MySQL Select Random Records
MySQL基础知识:创建MySQL数据库和表的更多相关文章
-
MySQL基础知识:MySQL Connection和Session
在connection的生命里,会一直有一个user thread(以及user thread对应的THD)陪伴它. Connection和Session概念 来自*的一个回答 ...
-
MySQL基础知识-安装MySQL
前导: 昨天去参加了一个面试,公司不太大,是一家日资企业,在国内有几家分公司,面试官问到了MySQL的基本操作和性能优化,说了一大堆,倒是比较轻松的过了,但是面试结束之后,想了一下,基本操作忘的还是挺 ...
-
MySQL基础(2) | 数据库、数据表
MySQL基础(2) | 数据库.数据表 基本语法 数据库 #创建 CREATE DATABASE IF NOT EXISTS test_db_char DEFAULT CHARACTER SET u ...
-
mysql基础知识大全
前言:本文主要为mysql基础知识的大总结,mysql的基础知识很多,这里作简单概括性的介绍,具体的细节还是需要自行搜索.当然本文还有很多遗漏的地方,后续会慢慢补充完善. 数据库和数据库软件 数据库是 ...
-
mysql基础知识语法汇总整理(二)
mysql基础知识语法汇总整理(一) insert /*insert*/ insert into 表名(字段列表) values(值列表); --蠕虫复制 (优点:快速复制数据,测试服务器压力) in ...
-
mysql基础知识语法汇总整理(一)
mysql基础知识语法汇总整理(二) 连接数据库操作 /*连接mysql*/ mysql -h 地址 -P 端口 -u 用户名 -p 密码 例如: mysql -u root -p **** /* ...
-
MySQL 基础知识梳理
MySQL 的安装方式有多种,但是对于不同场景,会有最适合该场景的 MySQL 安装方式,下面就介绍一下 MySQL 常见的安装方法,包括 rpm 安装,yum 安装,通用二进制安装以及源码编译安装, ...
-
MySQL基础知识:启动管理和账号管理
整理.记录常用的MySQL基础知识:时间久了,很多就忘记了. 操作系统环境为MacOS Catalina, MySQL版本为: 8.0.13 MySQL Community Server - GPL. ...
-
MySQL基础知识:Character Set和Collation
A character set is a set of symbols and encodings. A collation is a set of rules for comparing chara ...
随机推荐
-
UVA 820 --- POJ 1273 最大流
找了好久这两个的区别...UVA820 WA了 好多次.不过以后就做模板了,可以求任意两点之间的最大流. UVA 是无向图,因此可能有重边,POJ 1273是有向图,而且是单源点求最大流,因此改模板的 ...
-
Linux upstart启动方式详解
Ubuntu从6.10开始逐步用Upstart()代替原来的SysVinit进行服务进程的管理.RHEL(CentOS)也都从版本6开始转用Upstart代替以往的init.d/rcX.d的线性启动 ...
-
pickle 数据对象的序列化和反序列化
python的pickle模块实现了基本的数据序列和反序列化.通过pickle模块的序列化操作我们能够将程序中运行的对象信息保存到文件中去,永久存储:通过pickle模块的反序列化操作,我们能够从文件 ...
-
JsRender系列demo(6)-无名
<!DOCTYPE html> <html> <head> <script src="http://code.jquery.com/jquery.j ...
-
[BZOJ 1085] [SCOI2005] 骑士精神 [ IDA* 搜索 ]
题目链接 : BZOJ 1085 题目分析 : 本题中可能的状态会有 (2^24) * 25 种状态,需要使用优秀的搜索方式和一些优化技巧. 我使用的是 IDA* 搜索,从小到大枚举步数,每次 DFS ...
-
网页title标题的闪动效果
通过网页title来提示用户有新消息这个功能很常见,比如现在的微博,还有一些邮箱,这个功能都很常见. 显示信息数: <input type="text" id="t ...
-
[ext4]08 磁盘布局 - CheckSums
从2012年开始,Ext4和jbd2的元数据中都开始加入checksums.特性标识是metadata_csum.Checksum算法是在super_block中指定: struct ext4_sup ...
-
(!(~+[])+{})[--[~+";";][+[]]*[~+[]] + ~~!+[]]+({}+[])[[~!+[]]*~+[]]一行js代码的原理分析
再说这行代码之前,咱们先来预习一下知识. 我们都知道计算机操作系统分为32位或者64位.那么这个32位或64位指的是什么意思呢?其实,要想解释它并不难,其实这就是计算机处理数据的机制,32位表示计算机 ...
-
Opencv-Python No module named &#39;cv2.cv2&#39;
关于 No module named 'cv2.cv2'等其他一些问题,一般都是版本不兼容的问题,重装即可. pip uninstall opencv-python 然后 pip install op ...
-
Office 365平台及其价值主张
重要提示:<Office 365开发入门指南>视频教程还没有上架,目前会先公开几个小节的文字版本,让大家先睹为快,请大家转发给感兴趣的朋友,敬请留意课程的正式上架和优惠券发放通知. 从今天 ...