linux 虚拟网络设备的使用

1. linux 常见虚拟网络设备分类

常见虚拟网络设备有：bridge, tun/tap, veth-pairs, macvlan, macvtap等。有一篇博文写的挺好的，图文并茂：虚拟网络设备，然而这篇文章是一篇译文而且内容不全，详见英文原版连接：Introduction to Linux interfaces for virtual networking

2. bridge设备

bridge设备就是桥接设备，可以看作是一个简单的交换机，创建方式很简单：

ip link add dev br0 type bridge # 添加一个网桥设备br0
ip link set tap0 master br0 # 将网卡tap0连接到br0
ip link set enth0 master br0

如果装过livirtd，它会产生一个名称为virbr0的bridge设备和一个名叫virbr0-nic的tap设备。

3. tap设备的使用

虽然有这么多博文，但是我还是看不懂tap设备是个什么东西。经过搜索以及实践，大概明白了tap设备的工作方式。

tap设备是一个工作在二层的设备，可以看作是一个与二层设备进行交互的接口，或者看作是一个特殊的网卡。但是，这个特殊网卡需要有程序来使用它，比如说虚拟机软件或者vpn软件。tap设备的一端连接者网络设备，一般是网桥（bridge），另一端连接着使用这个tap设备的应用程序，比如说虚拟机软件。tap设备不需要配置ip地址也可以工作。注意：虚拟机软件使用tap设备，而里面的虚拟机使用的是虚拟机软件虚拟出来的一个跟tap设备关联的一个虚拟网卡（有点绕口（汗！）

一个实践的例子

创建一个tap设备tap0，然后将tap0用到kvm虚拟机当中。

ip tuntap add dev tap0 mode tap # 创建tap0
ip link set tap0 master virbr0 # 将tap0连接到virbr0上，virbr0是libvirtd自带的网桥设备。
ip link set tap0 up # 开启tap0设备

创建虚拟机，编辑网卡设备的xml，填入如下内容：

<interface type="ethernet">
  <mac address="52:54:00:25:57:81"/>
  <target dev="tap0" managed="no"/>
  <model type="virtio"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

注意此处managed="no"意思是不让libvirtd管理该设备。否则，tap0设备属性会被libvirtd修改，导致连接失败。

然后启动虚拟机看看网络是否畅通。

4. veth-pairs设备的使用

在之前的systemd-nspawn实践中已经接触过veth-pairs了，即--network-veth选项。这个选项会在主机中创建一对儿veth设备，这对设备两端是联通的，可以用于不同网络命名空间的通信，如containers之间的通信。启动centos container，在Host的/sys/class/net目录下会出现设备：

ve-centos7@if2

这就是在当前命名空间中的veth pairs设备的其中一个，另一个存在于container的命名空间中。在centos7容器中执行ip addr，可看到:

host0@if5

此时的问题在于，如何从Host*问container中的veth设备，毕竟container只是一个命名空间而非虚拟机。然而由于systemd-nspawn使用的是匿名的命名空间（anonymous namespace），ip netns只能处理命名的命名空间（named network namespace）。经过查阅，在进程相应/proc/<PID>/ns目录下可以找到对应命名空间文件。参考How to access an unnamed network namespace，这个连接可能由于众所周知的原因无法访问，我把内容复制到这里：

Some programs might create network namespaces without registering them in /run/netns as iproute2 does. This makes it hard to access them with readily available tools like ip netns exec. However, there is a way to register those network namespace, after they have been created.

The following session creates and enters an unnamed namespace:
# unshare -n bash
# ip a l
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# echo $$
6508
The pid is the only information which we need. Of course we could also gather that by using ps auxf or a variety of other methods.

Now, to register the namespace, we can run the following in another shell:
# touch /run/netns/new_namespace
# mount -o bind /proc/6508/ns/net /run/netns/new_namespace
After this is done, we can access it like any other network namespace created by iproute2:
# ip netns exec unnamed1 bash
# ip link set lo up
And now, if we run ip a l again in the shell we spawned with unshare:
# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever

访问进程的namespace的方法有了，关于如何寻找container对应的命名空间，可以使用ps aux | grep systemd命令，找到<PID>较大的那个/lib/systemd/systemd进程，该进程的namespace就是centos7容器所在的namespace。通过上述方法挂载该container的命名空间，通过ip netns exec <name> bash进入到container的命名空间内。手动mount挂载很麻烦，经过查阅发现可以使用ip netns命令自动挂载，以centos7容器为例，将其命名空间挂载到/run/netns/centos7，可以在Host中使用下面方法：

ip netns attach centos7 <PID>  # 此处的PID就是container的systemd在Host中的PID
ip netns exec centos7 bash # 切换到centos7容器的命名空间

然后ip addr就可以看到host0网卡了。至此，systemd-nspawn的veth网卡终于搞明白了。

veth设备能不能用于qemu虚拟机呢？并不能。参考use veth device with qemu。根据这个网页所说，qemu不能直接用veth设备，但是可以使用macvtap设备连接到veth设备。动手试试：

ip link add veth0 type veth peer name veth1 # 创建veth设备对
ip link set veth0 master virbr0 # 将veth0连接到virbr0网桥上
ip link add mac-veth1 link veth1 type macvtap mode vepa # 创建一个macvtap设备
ip link set <device> up # 把上面三个设备都打开，否则无法联网

虚拟机编辑xml文件如下：

<interface type="ethernet">
  <mac address="6a:47:6c:b4:d0:41"/> <!-- 注意此处的mac地址跟mac-veth1的mac地址一样 -->
  <target dev="mac-veth1" managed="no"/>
  <model type="virtio"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

基本上跟tap设备没有差别，唯一要注意的地方是mac地址必须跟macvtap设备的mac地址一样。开机试试，成功联网！如果不想自己创建macvtap设备，也可以让虚拟机自己创建。如下：

<interface type=\'direct\' trustGuestRxFilters=\'no\'>
  <mac address="6a:47:6c:b4:d0:41"/>
  <source dev=\'veth1\' mode=\'vepa\'/>
  <model type="virtio"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

5. macvlan和macvtap设备的使用

macvlan原理就是在一个物理网卡上创建多个有mac地址的虚拟网卡。macvtap设备是基于macvlan实现的，只不过macvtap创建了一个设备文件（/dev/tapX,X是一个数字），以便于跟应用程序交互。它们都包含四种模式：vepa, bridge, private，passthru四种。区别如下：

vepa：全称Virtual Ethernet Port Aggregator mode，所有虚拟网卡帧都要发给外部交换机。外部交换机需要支持harpin模式才能使虚拟网卡之间相互通信。
bridge：所有虚拟网卡之间是联通的，可以直接进行通信。
private：禁止虚拟网卡之间的通信，即使外部交换机支持harpin模式也不行。
passthru：只允许创建一个虚拟网卡，所有物理网卡的流量都会转发到这个虚拟网卡，常用于macvtap模式。

至于数据包流程图可以文章开头的参考连接。

macvlan和macvtap区别就在于，macvlan一般用于系统本身，如container等，macvtap一般用于应用程序，如虚拟机等。

创建语法：

ip link add link DEVICE name NAME type { macvlan | macvtap } mode { private | vepa | bridge | passthru  [ nopromisc ] | source }

其中DEVICE是目标物理网卡，NAME是要创建的虚拟网卡名。创建完成后的网卡名称一般是“虚拟网卡名@物理网卡名”。

systemd-nspawn也可以指定使用macvlan。选项--network-macvlan=<interface_name>会创建一个macvlan虚拟网卡，<interface_name>是对应物理网卡的名称。

最后补充一点：使用macvlan或macvtap时，不论使用哪种模式，虚拟网卡都无法跟物理网卡直接通信。

秒客网

linux 虚拟网络设备的使用 - 上帝掉眼泪