Linux如何在x86-64中支持超过512GB的虚拟地址范围？

The user virtual address space for x86-64 with Linux is 47 bit long. Which essentially means that Linux can map a process with around ~128 TB virtual address range.

Linux的x86-64用户虚拟地址空间为47位。这实际上意味着Linux可以映射具有大约~124 TB虚拟地址范围的进程。

However, what confuses me that x86-64 architecture supports ISA defined 4-level hierarchical page table (arranged as radix-tree) for each process. The root of the page table can only map up to 512 GB of contiguous virtual address space. So how Linux can support more than 512GB of virtual address range? Does it uses multiple page tables for each process? If yes, then for a process what should the CR3 (x86-64's register to contain the address of the base of the page table) contain for any given process? Am I missing something?

但是,让我感到困惑的是x86-64架构支持每个进程的ISA定义的4级分层页表(排列为基数树)。页表的根目录最多只能映射512 GB的连续虚拟地址空间。那么Linux如何支持512GB以上的虚拟地址范围呢?它是否为每个进程使用多个页表?如果是,那么对于一个进程,CR3(x86-64的寄存器包含页表基址的地址)应包含哪个给定进程?我错过了什么吗?

2 个解决方案

#1

The root of the page table can only map up to 512 GB of contiguous virtual address space. So how Linux can support more than 512GB of virtual address range? Does it uses multiple page tables for each process? If yes, then for a process what should the CR3 (x86-64's register to contain the address of the base of the page table) contain for any given process? Am I missing something?

页表的根目录最多只能映射512 GB的连续虚拟地址空间。那么Linux如何支持512GB以上的虚拟地址范围呢?它是否为每个进程使用多个页表?如果是,那么对于一个进程,CR3(x86-64的寄存器包含页表基址的地址)应包含哪个给定进程?我错过了什么吗?

I don't know what do you mean by "root of the page table", but paging on x86-64 looks like this:

我不知道“页面表的根目录”是什么意思,但x86-64上的分页看起来像这样:

Page tables - the lowest level of paging structures. Each has 512 8-byte entries (PTE) describing one 4 KiB page, so PT describes 512 * 4 KiB = 2 MiB of memory (it can also work as 2 MiB page, but let's leave it for now).

页表 - 分页结构的最低级别。每个都有512个8字节条目(PTE)描述一个4 KiB页面,因此PT描述512 * 4 KiB = 2 MiB的内存(它也可以作为2 MiB页面使用,但现在让我们保留它)。

Page directories - table, similar to PT, containing 512 8-byte entries (PDE) pointing to PTs; so, PD describes 512 * 2 MiB = 1 GiB of memory (it can also work as 1 GiB page, similary to PT).

页目录 - 表,类似于PT,包含指向PT的512个8字节条目(PDE);因此,PD描述512 * 2 MiB = 1 GiB的内存(它也可以作为1 GiB页面,类似于PT)。

Page directory page table - similar to PD, but contains 512 8-byte entries (PDPTE) pointing to PDs; so, PDPTE describes 512 * 1 Gib = 512 GiB of memory.

页目录页表 - 类似于PD,但包含指向PD的512个8字节条目(PDPTE);所以,PDPTE描述了512 * 1 Gib = 512 GiB的内存。

PML4, the highest level of paging structures, is table containing 512 8-byte entries (PML4E) pointing to PDPTs; so, PML4 describes 512 * 512 GiB = 256 TiB of memory.

PML4是*别的分页结构,是包含512个8字节条目(PML4E)的表,指向PDPT;因此,PML4描述了512 * 512 GiB = 256 TiB的内存。

I don't know exact memory map of Linux, but probably the higher half (from -128 TiB to 0 - from 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF) is reserved for kernel, lower half (from 0 to 128 TiB - from 0x0000000000000000 to 0x00007FFFFFFFFFFF) is for userspace applications. So, Linux supports 512 times the 512 GiB of virtual address range you are asking; even Torvalds wouldn't say "we won't support PML4". I don't know what confuses you - is it the fact you missed the part saying that page table maps 2 MiB and you've taken it as it maps one page - 4 KiB - but if there is anything I could clarify, ask about it.

我不知道Linux的确切内存映射,但可能更高的一半(从-128 TiB到0 - 从0xFFFF800000000000到0xFFFFFFFFFFFFFFFF)保留给内核,下半部分(从0到128 TiB - 从0x0000000000000000到0x00007FFFFFFFFFFF)是用户空间应用程序因此,Linux支持512倍于您要求的512 GiB虚拟地址范围;甚至托瓦兹也不会说“我们不会支持PML4”。我不知道是什么让你感到困惑 - 你错过了这个部分说你的页面表映射2 MiB而你已经把它当作一个页面映射 - 4 KiB - 但是如果有什么我可以澄清,请询问它。

#2

Typically process address spaces aren't shared, which means, the involved page tables aren't shared between distinct processes either. And that means at all 4 table levels.

通常,不共享进程地址空间,这意味着所涉及的页表也不在不同进程之间共享。这意味着在所有4个表级别。

Of course, the common (kernel) part is always present in all address spaces, so, in fact, there's some sharing, but the memory there is only accessible to the kernel itself.

当然,公共(内核)部分总是存在于所有地址空间中,因此,事实上,存在一些共享,但内存只能由内核访问。

Other than that, indeed, every process has its own page tables pretty much and there isn't any problem with using all 2⁴⁸ addresses in any one of them. At least, there's no special limitation on the part of the CPU, although there can be on the part of the OS.

实际上,除此之外,每个进程都有自己的页表,并且在任何一个进程中使用所有248个地址都没有任何问题。至少,CPU的部分没有特别的限制,尽管操作系统可以存在。

#1

The root of the page table can only map up to 512 GB of contiguous virtual address space. So how Linux can support more than 512GB of virtual address range? Does it uses multiple page tables for each process? If yes, then for a process what should the CR3 (x86-64's register to contain the address of the base of the page table) contain for any given process? Am I missing something?

页表的根目录最多只能映射512 GB的连续虚拟地址空间。那么Linux如何支持512GB以上的虚拟地址范围呢?它是否为每个进程使用多个页表?如果是,那么对于一个进程,CR3(x86-64的寄存器包含页表基址的地址)应包含哪个给定进程?我错过了什么吗?