Linux内存管理学习1 —— head.S中的段页表的建立

时间:2025-01-08 08:03:55

作者

彭东林

pengdonglin137@163.com

平台

TQ2440

Qemu+vexpress-ca9

Linux-4.10.17

概述

  在Linux自解压完毕后,开始执行arch/arm/kernel/head.S,然后跳转到init/main.c中的start_kernel开始执行。在head.S中为了便利Linux内核启动,会建立临时的段页表。这里以TQ2440和vexpress-ca9为例,其中TQ2440使用的SoC是S3C2440,ARM核心是ARM920T,指令集是ARMv4T,而vexpress-ca9是ARM核心是Cortex-A9,指令集是ARMv7。为了便于理解,在分析的时候主要以2440为主,只是顺便说一下ARMv7,因为这两个大同小异。

下面是代码分析时的一些条件

1、以设备树的方式启动Linux内核

2、下面是一些宏和变量的说明:

说明 TQ2440(ARM920T) vxpress(Cortex-A9)
CONFIG_ARM_LPAE   No No
TEXT_OFFSET 内核代码段相对于内核地址空间的偏移量 0x8000 0x8000
PAGE_OFFSET 内核地址空间的偏移量 0xC000_0000 0xC000_0000
 KERNEL_RAM_VADDR =PAGE_OFFSET+TEXT_OFFSET 0xC000_8000 0xC000_8000
PG_DIR_SIZE 一级页表的大小 0x4000 (16KB) 0x4000 (16KB)
PMD_ORDER 一级页表的每个页表项占用的字节(2^(PMD_ORDER)) 2^2 = 4 2^2 = 4
swapper_pg_dir

一级页表的虚拟起始地址

KERNEL_RAM_VADDR - PG_DIR_SIZE

0xC000_4000 0xC000_4000
CONFIG_ARM_VIRT_EXT   No Yes
CONFIG_XIP_KERNEL   No No
CONFIG_SMP   No Yes
CONFIG_SMP_ON_UP   No Yes
CONFIG_ARM_PATCH_PHYS_VIRT   Yes Yes
CONFIG_CPU_32v4T  ARM指令集 Yes No
CONFIG_CPU_32v7  ARM指令集 No Yes
CONFIG_CPU_V7M  ARM指令集 No No
__LINUX_ARM_ARCH__ ARM指令集 4 7
CONFIG_CPU_DCACHE_WRITETHROUGH   No No

 3、地址空间:

对于TQ2440,板子上面有64MB的物理内存,所以物理内存地址范围是: 0x3000_0000 ~ 0x3400_0000

对于express板子,分配了1GB的物理内存,所以物理内存地址范围是: 0x6000_0000 ~ 0xA000_0000

正文

在进入head.S是,MMU和D-Cache是关闭的,r0是0,r1的值任意,r2的值是dtb镜像在内存中的物理起始地址。

下面是对head.S精简后的代码:

 ENTRY(stext)

 #ifdef CONFIG_ARM_VIRT_EXT
bl __hyp_stub_install
#endif
@ ensure svc mode and all interrupts masked
safe_svcmode_maskall r9 mrc p15, , r9, c0, c0 @ get processor id
bl __lookup_processor_type @ r5=procinfo r9=cpuid
movs r10, r5 @ invalid processor (r5=)?
beq __error_p @ yes, error 'p' adr r3, 2f
ldmia r3, {r4, r8}
sub r4, r3, r4 @ (PHYS_OFFSET - PAGE_OFFSET)
add r8, r8, r4 @ PHYS_OFFSET /*
* r1 = machine no, r2 = atags or dtb,
* r8 = phys_offset, r9 = cpuid, r10 = procinfo
*/
bl __vet_atags
#ifdef CONFIG_SMP_ON_UP
bl __fixup_smp
#endif bl __fixup_pv_table bl __create_page_tables /*
* The following calls CPU specific code in a position independent
* manner. See arch/arm/mm/proc-*.S for details. r10 = base of
* xxx_proc_info structure selected by __lookup_processor_type
* above.
*
* The processor init function will be called with:
* r1 - machine type
* r2 - boot data (atags/dt) pointer
* r4 - translation table base (low word)
* r5 - translation table base (high word, if LPAE)
* r8 - translation table base (pfn if LPAE)
* r9 - cpuid
* r13 - virtual address for __enable_mmu -> __turn_mmu_on
*
* On return, the CPU will be ready for the MMU to be turned on,
* r0 will hold the CPU control register value, r1, r2, r4, and
* r9 will be preserved. r5 will also be preserved if LPAE.
*/
ldr r13, =__mmap_switched @ address to jump to after
@ mmu has been enabled
badr lr, 1f @ return (PIC) address mov r8, r4 @ set TTBR1 to swapper_pg_dir ldr r12, [r10, #PROCINFO_INITFUNC]
add r12, r12, r10
ret r12
: b __enable_mmu
ENDPROC(stext)
.ltorg
: .long .
.long PAGE_OFFSET

下面开始分析上面的代码:

1、第4行的__hyp_stub_install在vexpress上会执行,而在2440上不执行,这里暂时忽略

2、第7行的 safe_svcmode_maskall r9 确保处理器进入SVC模式,同时关闭IRQ和FIQ中断。对于2440,做了如下操作:

msr  cpsr_c, #(PSR_F_BIT | PSR_I_BIT | SVC_MODE)

3、第9行 mrc p15, , r9, c0, c0 用于获得processor id。

对于2440, CP15的C0的值是0x4112920x,参考手册 ARM920T Technical Reference Manual 的2.3节 CP15 register map summary

Linux内存管理学习1 —— head.S中的段页表的建立

对于vexpress,CP15的C0的值是0x414FC091,参考手册 ARM® Cortex®‑A9 Technical Reference Manual的 4. System Control

Linux内存管理学习1 —— head.S中的段页表的建立

比如对于2440,执行完第3行代码后,r9的值就是0x4112920x,而对于vexpress,r9的值是0x414FC091。

4、第10到12行,遍历kernel的".proc.info.init"段,找到与该处理器ID匹配的proc_info_list结构体,如果找到的话,r5寄存器存放的是该proc_info_list的物理地址,第11行将该地址存放到r10中,如果没有找到的话,

寄存器r5值是0,执行完第11行的movs代码后,第12行的beq就会成立,跳转到__error_p处,如果配置了CONFIG_DEBUG_LL,就会打印相应的错误信息:

Error: unrecognized/unsupported processor variant (0xXXXXXXX)

上面括号中是实际从CP15的C0里读到的值。

下面我们看看对于2440和vexpress这两个板子,与之匹配的proc.info.init字段都分别是什么?

对于2440,该部分定义在arch/arm/mm/proc-arm920.S中:

     define_processor_functions arm920, dabort=v4t_early_abort, pabort=legacy_pabort, suspend=

     .section ".rodata"

     string    cpu_arch_name, "armv4t"
string cpu_elf_name, "v4"
string cpu_arm920_name, "ARM920T" .align .section ".proc.info.init", #alloc .type __arm920_proc_info,#object
__arm920_proc_info:
.long 0x41009200
.long 0xff00fff0
.long PMD_TYPE_SECT | \
PMD_SECT_BUFFERABLE | \
PMD_SECT_CACHEABLE | \
PMD_BIT4 | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ
.long PMD_TYPE_SECT | \
PMD_BIT4 | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ
initfn __arm920_setup, __arm920_proc_info
.long cpu_arch_name
.long cpu_elf_name
.long HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
.long cpu_arm920_name
.long arm920_processor_functions
.long v4wbi_tlb_fns
.long v4wb_user_fns
.long arm920_cache_fns
.size __arm920_proc_info, . - __arm920_proc_info

第1行的define_processor_functions是一个宏,定义在arch/arm/mm/proc-macros.S中,根据传入的参数展开后如下:

    .type    arm920_processor_functions, #object
.align
ENTRY(arm920_processor_functions)
.word \dabort
.word \pabort
.word cpu_arm920_proc_init
.word cpu_arm920_proc_fin
.word cpu_arm920_reset
.word cpu_arm920_do_idle
.word cpu_arm920_dcache_clean_area
.word cpu_arm920_switch_mm
.word cpu_arm920_set_pte_ext
.word cpu_arm920_suspend_size
.word cpu_arm920_do_suspend
.word cpu_arm920_do_resume
.size arm920_processor_functions, . - arm920_processor_functions

第4到7行只读,存放了一下字符串,将来在启动阶段(start_kernel --> setup_arch --> setup_processor)会被打印出来

    pr_info("CPU: %s [%08x] revision %d (ARMv%s), cr=%08lx\n",
cpu_name, read_cpuid_id(), read_cpuid_id() & ,
proc_arch[cpu_architecture()], get_cr());

如:

[    0.000000] CPU: ARM920T [41129200] revision 0 (ARMv4T), cr=c000717f

第15到35行的数据将来可以通过一个struct proc_info_list进行访问:

struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};

第27行 initfn __arm920_setup, __arm920_proc_info 展开后是: __arm920_setup -  __arm920_proc_info,也就是这里存放了一个这两个符号的地址偏差,将来就可以根据__arm920_proc_info轻松地找到__arm920_setup

第33和35行的分析类似第1行,都是宏展开后生成的,直接在代码里搜索不到。

对于v4wbi_tlb_fns 定义在arch/arm/mm/tlb-v4wbi.S中:  define_tlb_functions v4wbi, v4wbi_tlb_flags ,展开如下:

    .type    v4wbi_tlb_fns, #object
ENTRY(v4wbi_tlb_fns)
.long v4wbi_flush_user_tlb_range
.long v4wbi_flush_kern_tlb_range
.long v4wbi_tlb_flags
.size v4wbi_tlb_fns, . - v4wbi_tlb_fns

对于arm920_cache_fns, 定义在arch/arm/mm/proc-arm920.S中 define_cache_functions arm920 展开后:

    .align
.type arm920_cache_fns, #object
ENTRY(arm920_cache_fns)
.long arm920_flush_icache_all
.long arm920_flush_kern_cache_all
.long arm920_flush_kern_cache_louis
.long arm920_flush_user_cache_all
.long arm920_flush_user_cache_range
.long arm920_coherent_kern_range
.long arm920_coherent_user_range
.long arm920_flush_kern_dcache_area
.long arm920_dma_map_area
.long arm920_dma_unmap_area
.long arm920_dma_flush_range
.size arm920_cache_fns, . - arm920_cache_fns

第34行,对于v4wb_user_fns 定义在arch/arm/mm/copypage-v4wb.c中:

struct cpu_user_fns v4wb_user_fns __initdata = {
.cpu_clear_user_highpage = v4wb_clear_user_highpage,
.cpu_copy_user_highpage = v4wb_copy_user_highpage,
};

如果将vmlinux反汇编,可以看到__arm920_proc_info这段的内容如下:

c06adf80 <__proc_info_begin>:
c06adf80: 41009200 #cpu_val
c06adf84: ff00fff0 #cpu_mask
c06adf88: 00000c1e #__cpu_mm_mmu_flags
c06adf8c: 00000c12 #__cpu_io_mmu_flags
c06adf90: ff968a3c #__cpu_flush
c06adf94: c04ed874 #arch_name
c06adf98: c04ed87b #elf_name
c06adf9c: #elf_hwcap
c06adfa0: c04ed87e #cpu_name
c06adfa4: c06b4040 #proc
c06adfa8: c06b4034 #tlb
c06adfac: c06b402c #user
c06adfb0: c00168c0 #cache

对于vexpress,对应的是proc.info.init定义在arch/arm/mm/proc-v7.S中,只留下需要关注的部分:

    define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend=

    .section ".rodata"

    string    cpu_arch_name, "armv7"
string cpu_elf_name, "v7"
.align .section ".proc.info.init", #alloc /*
* Standard v7 proc info content
*/
.macro __v7_proc name, initfunc, mm_mmuflags = , io_mmuflags = , hwcaps = , proc_fns = v7_processor_functions
ALT_SMP(.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
PMD_SECT_AF | PMD_FLAGS_SMP | \mm_mmuflags)
ALT_UP(.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
PMD_SECT_AF | PMD_FLAGS_UP | \mm_mmuflags)
.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | PMD_SECT_AF | \io_mmuflags
initfn \initfunc, \name
.long cpu_arch_name
.long cpu_elf_name
.long HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_FAST_MULT | \
HWCAP_EDSP | HWCAP_TLS | \hwcaps
.long cpu_v7_name
.long \proc_fns
.long v7wbi_tlb_fns
.long v6_user_fns
.long v7_cache_fns
.endm /*
* ARM Ltd. Cortex A9 processor.
*/
.type __v7_ca9mp_proc_info, #object
__v7_ca9mp_proc_info:
.long 0x410fc090
.long 0xff0ffff0
__v7_proc __v7_ca9mp_proc_info, __v7_ca9mp_setup, proc_fns = ca9mp_processor_functions
.size __v7_ca9mp_proc_info, . - __v7_ca9mp_proc_info

进一步展开后是:

     string  cpu_v7_name, "ARMv7 Processor"
define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend= .section ".rodata" string cpu_arch_name, "armv7"
string cpu_elf_name, "v7"
.align .section ".proc.info.init", #alloc /*
* ARM Ltd. Cortex A9 processor.
*/
.type __v7_ca9mp_proc_info, #object
__v7_ca9mp_proc_info:
.long 0x410fc090
.long 0xff0ffff0
ALT_SMP(.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
PMD_SECT_AF | PMD_FLAGS_SMP)
ALT_UP(.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
PMD_SECT_AF | PMD_FLAGS_UP)
.long PMD_TYPE_SECT | PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | PMD_SECT_AF
initfn __v7_ca9mp_setup, __v7_ca9mp_proc_info
.long cpu_arch_name
.long cpu_elf_name
.long HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_FAST_MULT | \
HWCAP_EDSP | HWCAP_TLS
.long cpu_v7_name
.long ca9mp_processor_functions
.long v7wbi_tlb_fns
.long v6_user_fns
.long v7_cache_fns
.size __v7_ca9mp_proc_info, . - __v7_ca9mp_proc_info

跟2440一样,其中的部分标号的定义如下:

ca9mp_processor_functions:  定义在arch/arm/mm/proc-v7.S中 define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend=

    .type    ca9mp_processor_functions, #object
.align
ENTRY(ca9mp_processor_functions)
.word v7_early_abort
.word v7_pabort
.word cpu_ca9mp_proc_init
.word cpu_ca9mp_proc_fin
.word cpu_ca9mp_reset
.word cpu_ca9mp_do_idle
.word cpu_ca9mp_dcache_clean_area
.word cpu_ca9mp_switch_mm
.word cpu_ca9mp_set_pte_ext
.word cpu_ca9mp_suspend_size
.word cpu_ca9mp_do_suspend
.word cpu_ca9mp_do_resume
.size ca9mp_processor_functions, . - ca9mp_processor_functions

v7wbi_tlb_fns:定义在arch/arm/mm/tlb-v7.S中 define_tlb_functions v7wbi, v7wbi_tlb_flags_up, flags_smp=v7wbi_tlb_flags_smp ,展开如下:

ENTRY(v7wbi_tlb_fns)
.long v7wbi_flush_user_tlb_range
.long v7wbi_flush_kern_tlb_range
ALT_SMP(.long flags_smp=v7wbi_tlb_flags_smp )
ALT_UP(.long v7wbi_tlb_flags_up )
.size v7wbi_tlb_fns, . - v7wbi_tlb_fns

v6_user_fns:定义在arch/arm/mm/copypage-v6.c中:

struct cpu_user_fns v6_user_fns __initdata = {
.cpu_clear_user_highpage = v6_clear_user_highpage_nonaliasing,
.cpu_copy_user_highpage = v6_copy_user_highpage_nonaliasing,
};

v7_cache_fns:定义在arch/arm/mm/cache-v7.S中 define_cache_functions v7 ,展开如下:

    .align
.type v7_cache_fns, #object
ENTRY(v7_cache_fns)
.long v7_flush_icache_all
.long v7_flush_kern_cache_all
.long v7_flush_kern_cache_louis
.long v7_flush_user_cache_all
.long v7_flush_user_cache_range
.long v7_coherent_kern_range
.long v7_coherent_user_range
.long v7_flush_kern_dcache_area
.long v7_dma_map_area
.long v7_dma_unmap_area
.long v7_dma_flush_range
.size v7_cache_fns, . - v7_cache_fns

对vmlinux反汇编后,可以看到__v7_ca9mp_proc_info部分的数据:

c06ee5fc <__v7_ca9mp_proc_info>:
c06ee5fc: 410fc090 #cpu_val
c06ee600: ff0ffff0 #cpu_mask
c06ee604: 00011c0e #__cpu_mm_mmu_flags
c06ee608: 00000c02 #__cpu_io_mmu_flags
c06ee60c: ffa2e260 #__cpu_flush
c06ee610: c0701b64 #arch_name
c06ee614: c0701b6a #elf_name
c06ee618: #elf_hwcap
c06ee61c: c011c780 #cpu_name
c06ee620: c0958094 #proc
c06ee624: c09081dc #tlb
c06ee628: c095802c #user
c06ee62c: c0958000 #cache

回到head.S继续分析,上面说完proc.info.init段的内容后,下面分析__lookup_processor_type:

 __lookup_processor_type:
adr r3, __lookup_processor_type_data
ldmia r3, {r4 - r6}
sub r3, r3, r4 @ get offset between virt&phys
add r5, r5, r3 @ convert virt addresses to
add r6, r6, r3 @ physical address space
: ldmia r5, {r3, r4} @ value, mask
and r4, r4, r9 @ mask wanted bits
teq r3, r4
beq 2f
add r5, r5, #PROC_INFO_SZ @ sizeof(proc_info_list)
cmp r5, r6
blo 1b
mov r5, # @ unknown processor
: ret lr
ENDPROC(__lookup_processor_type) /*
* Look in <asm/procinfo.h> for information about the __proc_info structure.
*/
.align
.type __lookup_processor_type_data, %object
__lookup_processor_type_data:
.long .
.long __proc_info_begin
.long __proc_info_end
.size __lookup_processor_type_data, . - __lookup_processor_type_data

由于还没有开启MMU,所以虚拟地址就是物理地址,但是由于kernel代码段的链接地址是从0xC0008000开始,而对于2440来说,物理内容的范围是0x3000_0000到0x3400_0000,所以如果直接用虚拟地址访问的话,程序一定会跑飞了。

所以在第2到第6行的代码首先会对第25行__proc_info_begin和第26行的__proc_info_end的虚拟地址转换,转换成物理地址,分别存放在r5和r6中,转换方法很简单

第7到第14行开始从r5(也就是"proc.info.init"段的起始物理地址)开始,以#PROC_INFO_SZ为步长进行遍历,寻找跟r9中的cpu id匹配的proc_info_list。匹配的方法很简单:从之前的分析知道,proc_info_list的前两个成员分别是cpu_val (r3)和cpu_mask (r4),将这两个值读出来,然后进行如下判断:(r9 & cpu_mask) 是否等于 cpu_val,如果相等,意味着找到匹配项,然后返回,此时r5中存放的是找到的proc_info_list的物理地址。否则的话,继续遍历下一个proc_info_list,直到遍历到最后一个proc_info_list,如果没有找到,r5被赋值为0,然后返回。

回到head.S继续分析。

5、第14到第17行代码完成的任务是计算物理内存的起始地址,方法如下:

    adr    r3, 2f
ldmia r3, {r4, r8}
sub r4, r3, r4 @ (PHYS_OFFSET - PAGE_OFFSET)
add r8, r8, r4 @ PHYS_OFFSET
.ltorg
: .long .
.long PAGE_OFFSET

首先获得2f标号的物理地址,在哪里存放的是2f标号的虚拟地址以及0xC000_0000。然后计算2f的物理地址跟虚拟地址之间的差值,再该差值加上0xC000_0000,就可以得到物理内存的起始地址。当然这里的前提是kernel被加载到(物理内存的起始地址 + 0x8000)处开始执行。

比如对于2440,执行完上面的操作后,r8的值是0x3000_0000,对于vexpress来说是,r8是0x6000_0000.

6、第23行,检查r2中传递的设备树镜像是否合法,如果不合法的话,r2会被清0。检查方法是:判断r2指向的地址的前4个字节是否等于OF_DT_MAGIC,是的话,表示合法,否则不合法

7、第25和第28行暂时忽略

未完待续

8、第30行调用__create_page_tables建立段式页表。