When I create a new variable in a C++ program, eg a char:
当我在C ++程序中创建一个新变量时,例如char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
那么C ++如何在内存中访问这个变量呢?我会想象它需要存储变量的内存位置,但那时需要一个指针变量,并且需要再次访问该指针。
5 个解决方案
#1
3
See the docs:
查看文档:
When a variable is declared, the memory needed to store its value is assigned a specific location in memory (its memory address). Generally, C++ programs do not actively decide the exact memory addresses where its variables are stored. Fortunately, that task is left to the environment where the program is run - generally, an operating system that decides the particular memory locations on runtime. However, it may be useful for a program to be able to obtain the address of a variable during runtime in order to access data cells that are at a certain position relative to it.
声明变量时,存储其值所需的内存将分配给内存中的特定位置(其内存地址)。通常,C ++程序不会主动确定存储其变量的确切内存地址。幸运的是,该任务留给运行程序的环境 - 通常是在运行时决定特定内存位置的操作系统。然而,程序能够在运行时期间获得变量的地址以便访问相对于它的特定位置的数据单元可能是有用的。
You can also refer this article on Variables and Memory
您还可以参考有关变量和内存的文章
The Stack
The stack is where local variables and function parameters reside. It is called a stack because it follows the last-in, first-out principle. As data is added or pushed to the stack, it grows, and when data is removed or popped it shrinks. In reality, memory addresses are not physically moved around every time data is pushed or popped from the stack, instead the stack pointer, which as the name implies points to the memory address at the top of the stack, moves up and down. Everything below this address is considered to be on the stack and usable, whereas everything above it is off the stack, and invalid. This is all accomplished automatically by the operating system, and as a result it is sometimes also called automatic memory. On the extremely rare occasions that one needs to be able to explicitly invoke this type of memory, the C++ key word auto can be used. Normally, one declares variables on the stack like this:
堆栈是局部变量和函数参数所在的位置。它被称为堆栈,因为它遵循后进先出原则。随着数据被添加或推送到堆栈,它会增长,当数据被删除或弹出时,它会缩小。实际上,每次从堆栈中推送或弹出数据时,内存地址都不会被物理移动,而是指向堆栈指针(顾名思义指向堆栈顶部的内存地址)上下移动。低于此地址的所有内容都被认为是在堆栈上并且可用,而其上方的所有内容都不在堆栈中,并且无效。这一切都是由操作系统自动完成的,因此有时也称为自动内存。在极少数情况下,需要能够显式调用此类型的内存,可以使用C ++关键字auto。通常,一个在堆栈上声明变量,如下所示:
void func () { int i; float x[100]; ... }
Variables that are declared on the stack are only valid within the scope of their declaration. That means when the function func() listed above returns, i and x will no longer be accessible or valid.
在堆栈上声明的变量仅在其声明范围内有效。这意味着当上面列出的函数func()返回时,i和x将不再可访问或有效。
There is another limitation to variables that are placed on the stack: the operating system only allocates a certain amount of space to the stack. As each part of a program that is being executed comes into scope, the operating system allocates the appropriate amount of memory that is required to hold all the local variables on the stack. If this is greater than the amount of memory that the OS has allowed for the total size of the stack, then the program will crash. While the maximum size of the stack can sometimes be changed by compile time parameters, it is usually fairly small, and nowhere near the total amount of RAM available on a machine.
放置在堆栈上的变量还有另一个限制:操作系统仅为堆栈分配一定量的空间。当正在执行的程序的每个部分都进入作用域时,操作系统会分配保存堆栈中所有局部变量所需的适当内存量。如果这大于操作系统允许的堆栈总大小的内存量,则程序将崩溃。虽然堆栈的最大大小有时可以通过编译时参数来改变,但它通常相当小,并且远不及机器上可用的RAM总量。
#2
2
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
C ++本身(或编译器)可以根据程序结构访问此变量,表示为数据结构。也许你问的是程序中的其他部分如何在运行时访问它。
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
答案是它有所不同。它可以存储在寄存器,堆栈,堆上,或者数据/ bss部分(全局/静态变量)中,具体取决于它的上下文和编译它的平台:如果你需要传递它通过引用(或指针)到其他函数,它可能会存储在堆栈中。如果您只在函数的上下文中使用它,它可能会在寄存器中处理。如果它是堆上对象的成员变量,那么它就在堆上,并通过对象的偏移量引用它。如果它是全局/静态变量,则一旦程序完全加载到内存中,就会确定其地址。
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
C ++最终编译成机器语言,并且通常在操作系统的上下文中运行,所以你可能想要了解一下Assembly基础知识,甚至是一些操作系统原理,以便更好地理解幕后发生的事情。
#3
1
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
假设这是一个局部变量,则该变量在堆栈上分配 - 即在RAM中。编译器跟踪堆栈上的变量偏移量。在基本场景中,如果随后使用变量执行任何计算,则将其移动到处理器的一个寄存器中,并且CPU执行计算。然后将结果返回到RAM。现代处理器将整个堆栈帧保留在寄存器中并具有多级寄存器,因此它可能变得非常复杂。
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
请注意二进制文件中不再提及“c”名称(除非您有调试符号)。二进制文件仅适用于内存位置。例如。它看起来像这样(简单的加法):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
二进制文件不知道“a”,“b”或“c”。编译器只是说“a在内存1中,b在内存2中,c在内存3中”。 CPU只是盲目地执行编译器生成的命令。
#4
0
how does C++ then have access to this variable in memory?
那么C ++如何在内存中访问这个变量呢?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".
您的计算机执行此操作,并通过将内存中变量的位置加载到寄存器中来指示如何执行此操作。这全部由汇编语言处理。我不会在这里详细介绍这些语言是如何工作的(你可以查阅它!)但这更像是C ++编译器的目的:将一组抽象的,高级的“指令”转换为实际的技术指令计算机可以理解和执行。你可以说汇编程序包含很多指针,尽管大多数是文字而不是“变量”。
#5
0
Lets say our program starts with a stack address of 4000000
让我们说我们的程序以堆栈地址4000000开头
When, you call a function, depending how much stack you use, it will "allocate it" like this
当你调用一个函数时,根据你使用的堆栈数量,它会像这样“分配”
Let's say we have 2 ints (8bytes)
假设我们有2个整数(8字节)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
然后会在集会中发生什么
MOV EBP,ESP
//Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
MOV EBP,ESP //这里我们将堆栈地址的原始值(4000000)存储在EBP中,并在函数结束时将其恢复为4000000
SUB ESP, 8
//here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
SUB ESP,8 //这里我们在堆栈中“分配”8个字节,这基本上只是将ESP addr减少了8个
so our ESP address was changed from 4000000 to 3999992
所以我们的ESP地址从4000000变为3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
这就是程序知道第一个int的堆栈地址是“3999992”,第二个int是3999996到4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
即使这与编译器无关,知道这一点非常重要,因为当你知道如何“分配”堆栈时,你就会意识到执行这样的事情是多么便宜
char my_array[20000]; since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
char my_array [20000];因为它所做的只是做sub esp,20000这是一个单独的汇编指令
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
但是如果你真的使用像memset(my_array,20000)这样的所有字节,那就是不同的历史。
#1
3
See the docs:
查看文档:
When a variable is declared, the memory needed to store its value is assigned a specific location in memory (its memory address). Generally, C++ programs do not actively decide the exact memory addresses where its variables are stored. Fortunately, that task is left to the environment where the program is run - generally, an operating system that decides the particular memory locations on runtime. However, it may be useful for a program to be able to obtain the address of a variable during runtime in order to access data cells that are at a certain position relative to it.
声明变量时,存储其值所需的内存将分配给内存中的特定位置(其内存地址)。通常,C ++程序不会主动确定存储其变量的确切内存地址。幸运的是,该任务留给运行程序的环境 - 通常是在运行时决定特定内存位置的操作系统。然而,程序能够在运行时期间获得变量的地址以便访问相对于它的特定位置的数据单元可能是有用的。
You can also refer this article on Variables and Memory
您还可以参考有关变量和内存的文章
The Stack
The stack is where local variables and function parameters reside. It is called a stack because it follows the last-in, first-out principle. As data is added or pushed to the stack, it grows, and when data is removed or popped it shrinks. In reality, memory addresses are not physically moved around every time data is pushed or popped from the stack, instead the stack pointer, which as the name implies points to the memory address at the top of the stack, moves up and down. Everything below this address is considered to be on the stack and usable, whereas everything above it is off the stack, and invalid. This is all accomplished automatically by the operating system, and as a result it is sometimes also called automatic memory. On the extremely rare occasions that one needs to be able to explicitly invoke this type of memory, the C++ key word auto can be used. Normally, one declares variables on the stack like this:
堆栈是局部变量和函数参数所在的位置。它被称为堆栈,因为它遵循后进先出原则。随着数据被添加或推送到堆栈,它会增长,当数据被删除或弹出时,它会缩小。实际上,每次从堆栈中推送或弹出数据时,内存地址都不会被物理移动,而是指向堆栈指针(顾名思义指向堆栈顶部的内存地址)上下移动。低于此地址的所有内容都被认为是在堆栈上并且可用,而其上方的所有内容都不在堆栈中,并且无效。这一切都是由操作系统自动完成的,因此有时也称为自动内存。在极少数情况下,需要能够显式调用此类型的内存,可以使用C ++关键字auto。通常,一个在堆栈上声明变量,如下所示:
void func () { int i; float x[100]; ... }
Variables that are declared on the stack are only valid within the scope of their declaration. That means when the function func() listed above returns, i and x will no longer be accessible or valid.
在堆栈上声明的变量仅在其声明范围内有效。这意味着当上面列出的函数func()返回时,i和x将不再可访问或有效。
There is another limitation to variables that are placed on the stack: the operating system only allocates a certain amount of space to the stack. As each part of a program that is being executed comes into scope, the operating system allocates the appropriate amount of memory that is required to hold all the local variables on the stack. If this is greater than the amount of memory that the OS has allowed for the total size of the stack, then the program will crash. While the maximum size of the stack can sometimes be changed by compile time parameters, it is usually fairly small, and nowhere near the total amount of RAM available on a machine.
放置在堆栈上的变量还有另一个限制:操作系统仅为堆栈分配一定量的空间。当正在执行的程序的每个部分都进入作用域时,操作系统会分配保存堆栈中所有局部变量所需的适当内存量。如果这大于操作系统允许的堆栈总大小的内存量,则程序将崩溃。虽然堆栈的最大大小有时可以通过编译时参数来改变,但它通常相当小,并且远不及机器上可用的RAM总量。
#2
2
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
C ++本身(或编译器)可以根据程序结构访问此变量,表示为数据结构。也许你问的是程序中的其他部分如何在运行时访问它。
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
答案是它有所不同。它可以存储在寄存器,堆栈,堆上,或者数据/ bss部分(全局/静态变量)中,具体取决于它的上下文和编译它的平台:如果你需要传递它通过引用(或指针)到其他函数,它可能会存储在堆栈中。如果您只在函数的上下文中使用它,它可能会在寄存器中处理。如果它是堆上对象的成员变量,那么它就在堆上,并通过对象的偏移量引用它。如果它是全局/静态变量,则一旦程序完全加载到内存中,就会确定其地址。
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
C ++最终编译成机器语言,并且通常在操作系统的上下文中运行,所以你可能想要了解一下Assembly基础知识,甚至是一些操作系统原理,以便更好地理解幕后发生的事情。
#3
1
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
假设这是一个局部变量,则该变量在堆栈上分配 - 即在RAM中。编译器跟踪堆栈上的变量偏移量。在基本场景中,如果随后使用变量执行任何计算,则将其移动到处理器的一个寄存器中,并且CPU执行计算。然后将结果返回到RAM。现代处理器将整个堆栈帧保留在寄存器中并具有多级寄存器,因此它可能变得非常复杂。
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
请注意二进制文件中不再提及“c”名称(除非您有调试符号)。二进制文件仅适用于内存位置。例如。它看起来像这样(简单的加法):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
二进制文件不知道“a”,“b”或“c”。编译器只是说“a在内存1中,b在内存2中,c在内存3中”。 CPU只是盲目地执行编译器生成的命令。
#4
0
how does C++ then have access to this variable in memory?
那么C ++如何在内存中访问这个变量呢?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".
您的计算机执行此操作,并通过将内存中变量的位置加载到寄存器中来指示如何执行此操作。这全部由汇编语言处理。我不会在这里详细介绍这些语言是如何工作的(你可以查阅它!)但这更像是C ++编译器的目的:将一组抽象的,高级的“指令”转换为实际的技术指令计算机可以理解和执行。你可以说汇编程序包含很多指针,尽管大多数是文字而不是“变量”。
#5
0
Lets say our program starts with a stack address of 4000000
让我们说我们的程序以堆栈地址4000000开头
When, you call a function, depending how much stack you use, it will "allocate it" like this
当你调用一个函数时,根据你使用的堆栈数量,它会像这样“分配”
Let's say we have 2 ints (8bytes)
假设我们有2个整数(8字节)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
然后会在集会中发生什么
MOV EBP,ESP
//Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
MOV EBP,ESP //这里我们将堆栈地址的原始值(4000000)存储在EBP中,并在函数结束时将其恢复为4000000
SUB ESP, 8
//here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
SUB ESP,8 //这里我们在堆栈中“分配”8个字节,这基本上只是将ESP addr减少了8个
so our ESP address was changed from 4000000 to 3999992
所以我们的ESP地址从4000000变为3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
这就是程序知道第一个int的堆栈地址是“3999992”,第二个int是3999996到4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
即使这与编译器无关,知道这一点非常重要,因为当你知道如何“分配”堆栈时,你就会意识到执行这样的事情是多么便宜
char my_array[20000]; since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
char my_array [20000];因为它所做的只是做sub esp,20000这是一个单独的汇编指令
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
但是如果你真的使用像memset(my_array,20000)这样的所有字节,那就是不同的历史。