一.正则表达式

1.1. 什么是正则表达式

　　正则表达式是处理字符串的方法，以行为单位，通过一些特殊符号的辅助，让用户可以轻易进行查找、删除、替换某特定字符串的操作。

1.2. 正则表达式与通配符的区别

　　网友看法，有些道理，直接摘抄了：

　　通配符是系统level的，通配符多用在文件名上，比如查找find，ls，cp，等等；

　　而正则表达式需要相关工具的支持: egrep, awk, vi, perl。在文本过滤工具里，都是用正则表达式，比如像awk，sed等，是针对文件的内容的。不是所有工具（命令）都支持正则表达式。

　　说白了就是有些命令支持正则表达式，一些不支持。

1.3. 语系对正则表达式的影响

　　不通语系，对字符的翻译规则不通，例如

　　LANG=C, 顺序为：0,1,2,3,4....A,B,C,D......Za,b,c,d....z

　　LANG=zh_CN,顺序为：0,1,2,3,4....a,A,b,B,c,C,d,D......z,Z

1.4. 一些特殊符号

　　特殊符号可以规避语系的影响，一些常用的特殊符号：

　　 [:alnum:]   所有的字母和数字，0-9，A-Z,a-z
　　[:alpha:]   所有的字母，A-Z,a-z
　　[:blank:]   所有呈水平排列的空白字符，空格和TAB
　　[:cntrl:]      所有的控制字符，CR,LF,TAL,DEL等
　　[:digit:]      所有的数字，0-9
　　 [:graph:]   所有的可打印字符，不包括空格（空格和TAB）外的所有按键
　　[:lower:]   所有的小写字母，,a-z
　　[:print:]      所有的可打印字符，包括空格
　　[:punct:]   所有的标点字符
　　[:space:]   所有呈水平或垂直排列的空白字符
　　[:upper:]   所有的大写字母，A-Z
　　[:xdigit:]   所有的十六进制数，0-9，A-Z,a-z的数字与字符

二.基础正则表达式

2.1.练习，使用grep

2.1.1. grep的高级功能

grep [-A] [-B] ‘搜索字符串’ filename

-A: after + 数字n，除了该行，列出后面的n行，-An，无空格

-B：before + 数字n，除了该行，列出前面的n行，-Bn，无空格

:/$ dmesg | grep -n 'eth'
1564:[    2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12
1565:[    2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
1569:[    2.433153] e1000 0000:02:01.0 ens33: renamed from eth0
:/$ dmesg | grep -n -A3 -B2 'eth'　　#-A和-B紧接数字，没有空格
1562-[    2.364718] Console: switching to colour frame buffer device 100x37
1563-[    2.395386] [drm] Initialized vmwgfx 2.9.0 20150810 for 0000:00:0f.0 on minor 0
1564:[    2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12
1565:[    2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
1566-[    2.427823] ahci 0000:02:05.0: version 3.0
1567-[    2.428781] ahci 0000:02:05.0: AHCI 0001.0300 32 slots 30 ports 6 Gbps 0x3fffffff impl SATA mode
1568-[    2.428784] ahci 0000:02:05.0: flags: 64bit ncq clo only 
1569:[    2.433153] e1000 0000:02:01.0 ens33: renamed from eth0
1570-[    2.445075] scsi host3: ahci
1571-[    2.445243] scsi host4: ahci
1572-[    2.445375] scsi host5: ahci

2.1.2. 基础正则表达式练习

使用鸟哥的例子，regular_express.txt

2.1.2.1 查找特定字符串和反向选取

:~/test$ grep -n 'the' regular_express.txt 
8:I can't finish the test.^M
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
:~/test$ grep -vn 'the' regular_express.txt 　　#反向选取
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.^M
6:GNU is free air not free beer.^M
7:Her hair is very beauty.^M
9:Oh! The soup taste good.^M
10:motorcycle is cheap than car.
11:This window is clear.
13:Oh!     My god!
14:The gd software is a library for drafting programs.^M
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.

2.1.2.2 利用[]查找字符集合

:~/test$ grep -n 't[ae]st'  regular_express.txt 　　#[ae]表示1个字符，a或者e
8:I can't finish the test.^M
9:Oh! The soup taste good.^M

:~/test$ grep -n 'oo' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 9:Oh! The soup taste good.^M 18:google is the best tools for search keyword. 19:goooooogle yes!
:~/test$ grep -n '[^g]oo'  regular_express.txt 　　 #[^g]不是g，查找oo且前面不是g的，第1行和第9行没有了
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

:~/test$ grep -n '[^a-z]oo'  regular_express.txt   　　　　#找oo且前面不是小写字符的
3:Football game is not use feet only.
:~/test$ grep -n '[^[:lower:]]oo' regular_express.txt #[:lower:]小写字符的另一种写法 3:Football game is not use feet only.

:~/test$ grep -n '[0-9]' regular_express.txt 　　　　　　　　#找数字
5:However, this dress is about $ 3183 dollars.^M 
15:You are the best is mean you are the no. 1. 
:~/test$ grep -n '[[:digit:]]' regular_express.txt 　　　　 #数字的另一种写法[:digit:]
5:However, this dress is about $ 3183 dollars.^M 
15:You are the best is mean you are the no. 1.

2.1.2.3 行首^与行尾$字符

注意：[^]代表反向选取，在括号外面^[]表示行首

:~/test$ grep -n '^the'  regular_express.txt 　　#找行首是the的
12:the symbol '*' is represented as start.
:~/test$ grep -n '^[a-z]'  regular_express.txt   #行首是小写字符的
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
:~/test$ grep -n '^[^a-zA-Z]'  regular_express.txt 　　#行首不是字符的
1:"Open Source" is a good mechanism to develop programs.
:~/test$ grep -n '\.$'  regular_express.txt 　　　　#行尾是点.的，点前加了转义字符，以为.本身是特殊字符
20:go! go! Let's go.

2.1.2.3 任意一个字符.与重复字符*

*：重复前一个字符0到无穷多次的意思，例如a*,代表 “空~无穷多个a”。与通配符不同，通配符中*表示0到多个字符，a*表示a或者“a若干字符”

. : 一定有一个任意字符

:~/test$ grep -n 'g..d'  regular_express.txt 　　　　　　　　# g..d表示g和d之间一定有2个任意字符
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world <Happy> is the same with "glad".

:~/test$ grep -n 'ooo*'  regular_express.txt 　　　　　　　　#ooo*，表示有2~无穷多个o
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!

:~/test$ grep -n 'g.*g'  regular_express.txt 　　　　　　　　# g.*g，找g开头，g结尾的字符，“.*”可以理解成0个或人一多个字符，与通配符中的*相当了
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
:~/test$ grep -n 'g*g'  regular_express.txt 　　　　　　　　# g*g，不一定是g开头，g结尾，因为g*表示0~无穷多个g，
1:"Open Source" is a good mechanism to develop programs.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
13:Oh!     My god!
14:The gd software is a library for drafting programs.^M
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

2.1.2.4 限定连续重复的字符范围{}

:~/test$ grep -n 'go\{2,5\}g'  regular_express.txt 　　# \{2,5\},2到5个o
18:google is the best tools for search keyword.
:~/test$ grep -n 'go\{2\}g'  regular_express.txt 　　　# \{2\},2个o
18:google is the best tools for search keyword.
:~/test$ grep -n 'go\{2,\}g'  regular_express.txt 　　# \{2,\},2个及以上个o
18:google is the best tools for search keyword.
19:goooooogle yes!:~/test$ grep -n 'go\{,5\}g'  regular_express.txt 　　# \{,5\},5个及以下个o
18:google is the best tools for search keyword.
:~/test$ grep -n 'go\{,10\}g'  regular_express.txt 　　# \{,10\},10个及以下个o
18:google is the best tools for search keyword.
19:goooooogle yes!

2.基础正则表达式字符总结

^word: word在行首

word$: word在行尾

. ：一定有一个任意字符

\ ：转义

* ： 0~无穷多个前一字符

[list] ：在list中的1个字符

[n1-n2] ：在字符范围内的1个字符

[^lish] ：反向选取

\{n1,n2\} ：连续n1到n2个前一字符

3.sed命令

管道命令，可以进行数据替换、删除、新增、选取特定行等。

4.awk命令

按字段处理

5.扩展的正则表达式

+ 　　重复1个或1个以上前一个字符

？　　 0个或1个前一个字符

| 　　或 'glad|good'

（）　分组 g(la|oo)d

（）+ 1个或多个重复组

秒客网

shell及脚本3——正则表达式