参考:
[development][dpdk][hugepage] 为不同的结点分配不同大小的大页内存
完成了以上内容之后, 下一步需要做的是挂载, 大页内存只有被挂载了之后,才能被应用程序使用.
挂载方法如下: 参考dpdk文档: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html
mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
如果是有1G的大页, 需要给定默认参数 pagesize=1G, 否则将使用默认的大小.
nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0
参考: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
CentOS 7 里面, 有一个service : dev-hugepages.mount 默认将会对大页内存进行挂载:
[root@dpdk crisp]# mount -l |grep huge
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
参考: https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems/
引用一段话:
So you are seeing all kinds of weird file systems in the output of mount(8) that are not listed in /etc/fstab, and you wonder what those are, how you can get rid of them, or at least change their mount options.
上文提到, 如过想增加这个参数, 可以在 /etc/fstab 中进行挂载.
同时禁用掉dev-hugepages.mount ???
systemctl mask dev-hugepages.mount
当然还有一招, 文中没提, 但是我莫名的, 天然喜欢后者:
把dev-hugepages.mount 改一下.
/etc/fstab 也是被 systemd管理的, 实际上, /etc/fstab 和 mount.mount 最终是被统一管理的, 以挂载点, 即目录名, 作为唯一识别的标识.
优先顺序由前到后依次为: /etc下的mount.mount > /etc/fstab > /usr下的mount.mount
摘自: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#
If a mount point is configured in both /etc/fstab and a unit file that is stored below /usr, the former will take precedence. If the unit file is stored below /etc, it will take precedence.
This means: native unit files take precedence over traditional configuration files, but this is superseded by the rule that configuration in /etc will always take precedence over configuration in /usr.
基于以上:
可以选择在/etc/fstab中增加一个/dev/hugepages 挂载点, 来覆盖 dev-hugepages.mount
[root@dpdk ~]# cat /etc/fstab |grep huge nodev /dev/hugepages hugetlbfs defaults,nofail,pagesize=1G 0 0 [root@dpdk ~]#
[root@dpdk ~]# mount -l |grep hugetlbfs
nodev on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=1G)
或写个增加了选项的 /etc/systemd/system/dev-hugepages.mount 文件, 来覆盖默认参数.
参数的具体写法: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#Options
[root@dpdk ~]# cp /usr/lib/systemd/system/dev-hugepages.mount /etc/systemd/system/ [root@dpdk ~]# diff /usr/lib/systemd/system/dev-hugepages.mount /etc/systemd/system/dev-hugepages.mount 20a21 > Options=pagesize=1G [root@dpdk ~]#
[root@dpdk ~]# mount -l |grep hugepages hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=1G) [root@dpdk ~]#
这个时候只是做了基本的系统准备工作, 实际上, 还没有配置使用大页内存:
[root@dpdk ~]# numastat -m |grep Huge AnonHugePages 8.00 0.00 8.00 HugePages_Total 0.00 0.00 0.00 HugePages_Free 0.00 0.00 0.00 HugePages_Surp 0.00 0.00 0.00 [root@dpdk ~]#
手工方法:
[root@dpdk ~]# echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages [root@dpdk ~]# numastat -m |grep Huge AnonHugePages 8.00 0.00 8.00 HugePages_Total 0.00 2048.00 2048.00 HugePages_Free 0.00 2048.00 2048.00 HugePages_Surp 0.00 0.00 0.00 [root@dpdk ~]#
这样, 只在node1上分配好了大页内存, 而node0上没有.
自动方法: 写一个service, 如下:
╰─>$ cat hugetlb-gigantic-pages.service [Unit] Description=HugeTLB Gigantic Pages Reservation DefaultDependencies=no Before=dev-hugepages.mount ConditionPathExists=/sys/devices/system/node ConditionKernelCommandLine=hugepagesz=1G [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/hugetlb-reserve-pages [Install] WantedBy=sysinit.target
╰─>$ cat hugetlb-reserve-pages #! /bin/bash nodes_path=/sys/devices/system/node/ if [ ! -d $nodes_path ]; then echo "ERROR: $nodes_path does not exist" exit 1 fi reserve_pages() { echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages } # This example reserves 2 1G pages on node0 and 1 1G page on node1. You # can modify it to your needs or add more lines to reserve memory in # other nodes. Don't forget to uncomment the lines, otherwise then won't # be executed. reserve_pages 2 node0 reserve_pages 2 node1
然后, 启用并重启
systemctl enable hugetlb-gigantic-pages.service