[development][dpdk][hugepage] 大页内存的挂载

时间:2021-09-13 06:51:19

 

参考: 

[development][dpdk][hugepage] 为不同的结点分配不同大小的大页内存

 

完成了以上内容之后, 下一步需要做的是挂载, 大页内存只有被挂载了之后,才能被应用程序使用.

挂载方法如下: 参考dpdk文档:  http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html

mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge

如果是有1G的大页, 需要给定默认参数 pagesize=1G, 否则将使用默认的大小.

nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0

参考: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

 

CentOS 7 里面, 有一个service : dev-hugepages.mount 默认将会对大页内存进行挂载:

[root@dpdk crisp]# mount -l |grep huge
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)

 

参考: https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems/

引用一段话:

So you are seeing all kinds of weird file systems in the output of mount(8) that are not listed in /etc/fstab, and you wonder what those are, how you can get rid of them, or at least change their mount options.

 

上文提到, 如过想增加这个参数, 可以在 /etc/fstab 中进行挂载.

同时禁用掉dev-hugepages.mount ???

systemctl mask dev-hugepages.mount

 

当然还有一招, 文中没提, 但是我莫名的, 天然喜欢后者:

把dev-hugepages.mount 改一下.

 

/etc/fstab 也是被 systemd管理的, 实际上, /etc/fstab 和 mount.mount 最终是被统一管理的, 以挂载点, 即目录名, 作为唯一识别的标识.

优先顺序由前到后依次为:  /etc下的mount.mount > /etc/fstab > /usr下的mount.mount

摘自:  https://www.freedesktop.org/software/systemd/man/systemd.mount.html#

If a mount point is configured in both /etc/fstab and a unit file that is stored below /usr, the former will take precedence. If the unit file is stored below /etc, it will take precedence. 
This means: native unit files take precedence over traditional configuration files, but this is superseded by the rule that configuration in /etc will always take precedence over configuration in /usr.

 

 

基于以上:

  可以选择在/etc/fstab中增加一个/dev/hugepages 挂载点, 来覆盖 dev-hugepages.mount 

[root@dpdk ~]# cat /etc/fstab |grep huge
nodev /dev/hugepages hugetlbfs defaults,nofail,pagesize=1G 0 0
[root@dpdk ~]# 
[root@dpdk ~]# mount -l |grep hugetlbfs
nodev on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=1G)

 

  或写个增加了选项的 /etc/systemd/system/dev-hugepages.mount 文件, 来覆盖默认参数.

  参数的具体写法: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#Options

[root@dpdk ~]# cp /usr/lib/systemd/system/dev-hugepages.mount  /etc/systemd/system/
[root@dpdk ~]# diff /usr/lib/systemd/system/dev-hugepages.mount  /etc/systemd/system/dev-hugepages.mount 
20a21
> Options=pagesize=1G
[root@dpdk ~]# 
[root@dpdk ~]# mount -l |grep hugepages
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=1G)
[root@dpdk ~]# 

 

这个时候只是做了基本的系统准备工作, 实际上, 还没有配置使用大页内存:

[root@dpdk ~]# numastat -m |grep Huge
AnonHugePages               8.00            0.00            8.00
HugePages_Total             0.00             0.00            0.00
HugePages_Free              0.00             0.00            0.00
HugePages_Surp              0.00            0.00            0.00
[root@dpdk ~]# 

 

手工方法:

[root@dpdk ~]# echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages 
[root@dpdk ~]# numastat -m |grep Huge
AnonHugePages               8.00            0.00            8.00
HugePages_Total             0.00         2048.00         2048.00
HugePages_Free              0.00         2048.00         2048.00
HugePages_Surp              0.00            0.00            0.00
[root@dpdk ~]# 

这样, 只在node1上分配好了大页内存, 而node0上没有.

 

自动方法: 写一个service, 如下:

╰─>$ cat hugetlb-gigantic-pages.service 
[Unit]
Description=HugeTLB Gigantic Pages Reservation
DefaultDependencies=no
Before=dev-hugepages.mount
ConditionPathExists=/sys/devices/system/node
ConditionKernelCommandLine=hugepagesz=1G

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/hugetlb-reserve-pages

[Install]
WantedBy=sysinit.target

 

╰─>$ cat hugetlb-reserve-pages 
#! /bin/bash

nodes_path=/sys/devices/system/node/
if [ ! -d $nodes_path ]; then
        echo "ERROR: $nodes_path does not exist"
        exit 1
fi

reserve_pages()
{
        echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
}

# This example reserves 2 1G pages on node0 and 1 1G page on node1. You
# can modify it to your needs or add more lines to reserve memory in
# other nodes. Don't forget to uncomment the lines, otherwise then won't
# be executed.

reserve_pages 2 node0
reserve_pages 2 node1

 

然后, 启用并重启

systemctl enable hugetlb-gigantic-pages.service