C# to IL 12 Arrays(数组)

时间:2022-11-04 10:27:50

An array is a contiguous block of memory that stores values of the same type. These values
are an indexed collection. The runtime has built in support to handle arrays. Vector is
another name for an array that has only one dimension and the index count starts at zero.
An array type can be any type derived from System.Object. This includes everything under
the sun, excluding pointers, which are not allowed in this version of the CLR. Nobody
knows about the next version. An array is a subtype of System.Array and we are given
plenty of leeway in working with arrays. The newarr instruction is used only for single
dimensional arrays.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

IL recognises the array data type. Thus, in the locals directive, we see an array of int32
called V_0. This is similar to the process of creating an array in C# where we first specify
that we want an array variable. Then, to create the actual array, the size of the array is
mentioned. In IL, the size is placed on the stack. IL uses newarr, similar to newobj to
create the array in memory. However, in C#, new is used for an array as well as for a
reference type. The data type of the array to be created is also passed to the newarr
instruction. Like newobj, newarr also places the reference of the array on the stack.
Thereafter, V_0 is initialized with this reference, which is pushed on the stack using
ldloc.0.
We will now explain the IL code generated for the statement a[1] = 10. To do so, the array
index, in this case, the value 1 followed by the value of the array is to be initialized i.e. 10
is pushed on the stack. So, there are 3 items on our stack: At the bottom, the array
reference, then the array index and finally the new value of the array variables.
These parameters are required by the instruction stelem.i4 to initialize an array member.
To read the value of an array variable, the address of the array reference is loaded on the
stack, followed by the index of the array. The instruction ldelem.i4 does the reverse. It
retrieves the value of an array variable. As mentioned earlier, i4 stands for 4 bytes on the
stack. Most instructions have such a data type at the end of their instruction.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

The array class has a member called Length.This Length member in C# gets converted to
an IL instruction ldlen, that requires an array object on the stack and returns the length.
Array handling is very powerful in .NET because IL has an intrinsic ability to understand
arrays. In IL, the array has been made a first class member.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Here we have a small C# program that has been transformed to a large IL program. To
begin, we have created 5 locals instead on 1. Two of them, V_0 and V_2, are arrays and the
rest are mere ints. The two stelem.i4 instructions initialize the 2 array members as seen in
the above programs.
Now let us understand how IL deals with a foreach statement. Ldloc.0 stores the reference
of the array on the stack. The instruction stloc.2 makes local V_2 as the same array
reference as V_0. Then the array reference V_2, which is similar to V_0, is loaded on the
stack. Finally using instruction ldlen, the length of the array is determined.
The number 2 is present on the stack. This represents the length of the array. It is changed
to occupy 4 bytes on the stack and is stored in local V_3, using the instruction stloc.3.
The number 0 is then placed on the stack using the ldc instruction. stloc pops this value 0
into local V_4 and br branches to label IL_002d where the value of variable V_4, 0, is

loaded. Also the value of local V_3, that stores the length of the array, i.e. 2 is loaded on
the stack.
Since 1 is less than 2, the code at label IL_001c is executed. This loads the array reference
on the stack, then loads local V_4, which is the index. Finally, ldelem fetches the value of
member a[0].
Adding 1 to the member V_4 serves a dual purpose: One to index the array for ldelema.i4
and the other to stop the loop whenever we cross the length of the array stored in local
V_3. This is how a for each statement is converted, step by step, into IL code

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

A function with params parameter accepts a variable number of parameters. How does the
compiler handle it?
As usual, we see object V_0, that is an instance of class zzz. Alongwith it is an array of
strings V_1, which we have not created. The number 2 is then placed on the stack and
following it is an array of size 2. As the two parameters i.e. the strings "hi" and "bye", are
to be placed on the stack, IL first creates an array of size of 2. This array address is
pushed onto the stack.
Using ldc.i4.0, index 0 is pushed on the stack, followed by a string "hi". Thereafter
instruction stelem is suffixed with the type. Here, ref stands for the object itself. Thus, the
temp array V_1's first or the zeroth member gets a value "hi" and the same process is
repeated for the second array member. Thus, for a params parameter, all the parameters
are converted into one huge array and the function abc is called with this array on the
stack. The final effect is similar to placing all the individual parameters in one big array.
In the function abc, the first change is that the function accepts an array with the same
name as in C#. This param directive uses the metadata to store an initial value for the
array. The array has two members "hi" and "bye". It is this data that the array b's members
must be initialized to. The .params with number 1 stands for the first parameter in the
function prototype. Here 0 stands for the return value and 1 stands for the first parameter,
that is our array.
We will explore the custom directive in detail later. The rest of the IL code loads the second
member of the array on the stack using ldelem.ref. This is similar in concept to stelem.ref.
Thus, the compiler does a lot of hard work for implementing the params modifier. To sum
up, it converts all the individual parameters into one array, and this array is placed on the
stack. IL does not fully understand the params modifier. Thus the params modifier has to
be the last entry in the parameter list. The ref prefix is used to denote a reference element

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Here, we will explain certain features of pointer handling in C# and IL. In the C# program
we have created an array of size 2 in the function abc and the array members are
initialised. The keyword fixed fixes the array reference in memory. For the purpose of
efficiency, the garbage collector can move things around in memory. By fixing the reference

in memory, we can prevent the Garbage Collector from moving this reference in memory.
This array reference is stored in a pointer to an int and the function pqr is called. This
function displays the value of the first member of the array and then changes it. The
change is reflected in the original array also. In the locals, we define our int array as usual,
but we have another variable V_1, that is also a pointer, but with a & and not a *. This
pointer is also pinned, which means that IL will not move it around. If it is moved in
memory, then we cannot keep track of its memory location. Thus, a fixed becomes a
pinned location.
Using ldelema, the array and its index are pushed on the stack. V_1 is initialized to this
value and function pqr is called. In the function pqr, a [] is converted into a memory
location. Thus, the address of the array is loaded on the stack. Then, the numbers 4 and 1
are placed on the stack because an int size is 4 and the array index is 1. After multiplying
them, 4 is added to the product to get the offset. The array members are then displayed.
The same logic on arrays can be applied to change its value. Whether a[1] or *(a+1) is
used, the above program remains the same.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

The array s is an array of three strings. We have declared an array of objects but initialised
it to an array of strings, which is perfectly legal in C#. We then initialised the members of t
to a null, a string and a yyy object respectively. The runtime knows that even though t is
an array of objects, it was initialized to an array of strings. Its members can only be strings
or a NULL.
The IL code is very straightforward. It uses newarr to create an array of strings. Then it
uses stloc.1 to initialize V_1 or array t. Thereafter, stelem.ref is used to initialize the
individual array members. However, the last stelem.ref checks the data type of the runtime
error and flags it as an exception. The code used for throwing the exception is not present
in the array class at all. It is in stelem.ref and we are not privy to this code.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Had the compiler been a little more concerned about exceptions, it would have prevented
the above program from throwing one at runtime, by spotting the error at compile time
itself. We have the same situation as before. The array t is an array of objects, but
initialized to an array of strings. The member t[0] is initialized to a yyy object, but now with
a cast. This cast calls the string operator or op_Implicit functions, that returns a string.
As the cast is not stated explicitly in the second case, the function op_Implicit does not
convert the yyy object into a String. The compiler should have noticed it at run time and
thrown an exception. But it ignores this completely. Sometimes compilers do not behave
as intelligently as expected.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

This is quite a huge program. The explanation is slightly complicated but, without
understanding IL code, it is next to impossible to understand the nitty-gritty of C#.
Lets us tread one step at a time. This example demonstrates some basic concepts of C#
programming. We first create an array of objects called a, of size 3 and initialize them to
two numbers and one string. Remember that everything in the .NET world is an object.
Then we have another object o that is initialized to a. We do not get an error, but you need
to bear in mind that a is an array and o is an object, that now stirs a reference to an array.
We call the function F four times:

• first with the object a, which is an array.
• then with the same object cast to an object.
• then with the object o.
• finally with the object a cast to an array of objects.
The function F accepts the parameter in an array of objects called b. The first member b[0]
is stored in an object called o. The fullname of this object and the length of the array are
printed using the WriteLine function.
In the first case, an array of 3 ints is placed on the stack. The name is System.Int32 and
the size of the array is 3.
In the second case, as the array is casted into an Object, only the first member becomes a
System.Object.
The third case has an object placed on the stack which is read in an array of objects. The
size is displayed as 1 since the size of the original is 1.
In the last case, C# remembers that o was equated to an array of 3 ints and thus the new
array size is 3.
Up to the stelem.ref statement,the 3 array members are merely being initialized to the
value of 1, Hello and 123. The local V_0 is array a and local V_1 refers to object o. As it is
an array of objects, the string does not pose any problems, but since the numbers are
value types, they have to be first converted to a reference type using the box instruction.
The first call simply places the array stored in local V_0 on the stack. The second call
places 1 on the stack and then creates a new array of size 1 using newarr. It stores this
new array in local V_2 and then loads the value of local V_2, which is an object, on the
stack. Then, it loads a 0 and the first main array containing 3 members, on the stack.
stelem.ref is used to initialize V_2 to this value. This local is then placed on the stack. See
what a simple cast does.
Similarly, in the third case we create an array of size 1, store it in local V_0 and then place
it on the stack. Then, we place 0 and the local V_1 on the stack and initialize V_1 to it for
the function. The last call simply places the object V_1 on the stack and calls castclass.
Function F is straightforward while performing its job. Ask yourself whether it was the C#
code that enabled you to grasp the program or was it the IL code?

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Our array above has only 3 members, whereas we tried to store a value in the seventh
member. Whenever we exceed the bounds of an array, we will get a
IndexOutOfRangeException at runtime. Thus, be careful in dealing with arrays. Do not
cross the picket line. We store values in an array and index them, so that we can retrieve a
single item by position

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

We have different instructions for dealing with value types and arrays. Arrays are nothing
but a number of variables stored together in memory. The ldelema takes two parameters
on the stack. The first is the address of the array that is V_0 and the second is the index of
the variable whose memory location is desired.
After running the instruction we have on the stack, the address of a variable at a specified
array index. The instruction ldelema requires the data type of the array, because the offset
of the members of the array is decided by the data type. The instruction stobj stores the
value in the memory location thereby initializing the first member of the array to 10.
To display the first member, the address is placed on the stack and ldobj is used to
retrieve the value. The instructions ldobj and stobj have nothing to do with arrays. They
deal with reading a memory location and placing the value found on the stack and vice
versa. Thus they only work with value type arrays.

C# to IL 12 Arrays(数组)

Since we placed a null array reference on the stack, we get an NullReferenceException
error. We are basically simulating some of the exceptions that arrays can throw at us

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Like we used the ldlen instruction earlier, we could have instead used the get_Length
function, which in turn, is a Property of the Array class. The choice is yours, but as we
demonstrated earlier, the Length property is converted to the ldlen instruction by the C#
compiler, as it is far more efficient. At the end of the day, the get_Length function does the
same thing. IL does not have instructions that can handle arrays other than vectors. Thus,
multi-dimensional arrays, also called general arrays, are created using array functions.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

One area where C# excels in is array handling. This is only because IL understands arrays
internally. Lets us now find out how IL handles two dimensional arrays.
A two dimensional array is declared in the same way that a normal array is declared, and
the dimensions are stated in the new instruction. The array index starts from 0 and not
from 1. In IL, to create a two dimensional array, there is a special syntax, i.e. a 0 followed
by 3 dots, twice in the locals directive. The two array dimensions are placed on the stack
and newobj is called. It is not newarr. Newobj calls the constructor of the two dimensional
array class that takes two parameters. The return value is then stored in local V_0.
To fetch a value from a two dimensional array, the reference to the array is loaded on the
stack and stored in V_0, followed by the two indexes, using ldc. Thereafter the values are
placed on the stack to initialize the array member. The function Set of the same int array
class is called with four parameters on the stack.
Conversely, to fetch a value, the function Get is called with the 3 parameters on the stack,
the array reference and the 2 index values. Thus, multi-dimensional arrays are built using
array class functions, and not IL instructions, which are used to build single dimensional
arrays. The rank of an array is defined as the number of dimensions of the array. The
runtime expects at least a rank of 1.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

A general purpose array has an upper bound and a lower bound. Unfortunately, as of now,
the runtime does not do any bound checking. The first dimension has a lower bound of 1
and an upper bound of 3. You can choose the bounds you desire

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

An array can have any rank. The above array is a three dimensional one and has a rank of
3. So, we have to use the array handling functions to work with them. The rank of an array
is declared by using a comma between the square brackets. The number of commas plus
one is the rank of an array. If no specific bounds are supplied, the default is 0 for the lower
bound and infinity for the upper bound.
You can specify none, one or both bounds. The CLR, in this version, ignores all the bounds
information you provide, and only pays heed to the number placed on the stack at the time
of creation of the array. Here, you have to supply all the information. Only those arrays
that have a 0 bound in all their dimensions, are CLR compliant.
In the above example, three bound values are placed on the stack and the array
constructor is called with three values. We are not allowed to use newarr, as the above
array is not a vector. Now to set it to a value, the three index values are placed on the stack
in a specific order. The same Set Function is called, but this time with four parameters.
The same rules are relevant for the Get function also. The point that we want to make is
that the magnitude of the rank has no effect on the way the array is handled. No
substantial changes are required.
There are two array constructors that can be used. The first takes the same number of
parameters as the rank of the array. The second constructor takes up twice the number of
parameters as the rank of the array. In the second type of constructor, the first two
parameters specify the lower and upper bounds of the first dimension, and the next two
parameters specify the upper and lower bounds for the second dimension and so on. The
first constructor always assumes the lower bound to be zero.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

We then change the above two lines to

C# to IL 12 Arrays(数组)

and we see the following exception thrown at us.

C# to IL 12 Arrays(数组)

An array with a lower and upper bound, having a rank of 2 is placed on the stack. The first
dimension starts at 5 and ends at 10. Thus, on the stack is placed first the lower bound i.
e. 5, and then, the length of the array. There is no upper bounds. As the array starts at 5
and ends at 10, the length is calculated as follows: 10 - 5 + 1 = 6 (i.e. the upper bound -
lower bound + 1). The same rule holds true for the next rank.
The rest of the code remains the same. When the array member 6, 5 are changed to index
values of 1, 2, an exception is thrown. This is because the array bounds for the first
dimension are 5 to 10 and for the second dimension are 3 to 7. Any attempt to cross the
array bounds in any direction generates an exception.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Let us explore jagged arrays where an array member can contain another array of a
different length. We are creating an array that has an irregular shape. In C#, the syntax to
create an array of arrays is the same. It consists of two square brackets [][]. We first create
the array using only the first dimension. This is done by using newarr and stating an array
data type as a parameter. We then initialize V_0 with this array reference.
Now, since we have to create two separate one dimensional arrays, we first place the array
reference on the stack. Then we place the index of the array member we want to initialise
followed by the size of the new array. Finally, we call newarr to create an array of ints and
place the reference on the stack. stelem.ref is used to initialize the array member with this
array reference. The same is repeated for the second member a[1] also.
The function ldlen returns the length of the array. For the main array, using ldloc.0 its
reference is placed on the stack. For the second length, ldelem.ref is used to first fetch the
reference of the array out of the first array member a[0], and then ldlen is used to obtain
the length.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

The above example is similar to its predecessor, though it is more elaborate and complete.
A jagged array is created that is made of two arrays of sizes 2 and 3 respectively. They can
be initialized in one stroke. IL does it the hard way. To fetch the value of a[1][2], it places
the reference of the array on the stack. Then it places 1, the first array index, on the stack.
Thereafter, ldelem.ref is used to obtain an array reference.
Thus, at first an array reference is pushed on the stack. Then 2 is placed on the stack, and
ldelema.i4 is used to get the second member of this new array. A jagged array is treated as
an array whose members contain other independent arrays.
An array of arrays is different from a multi dimensional array. A multi dimensional array
forms one memory block whereas, an array of arrays holds references to other arrays in
memory. Thus, an array of arrays is slower in execution since it needs to make an extra
indirection to reach the final element.
We can also use pointers with arrays. The salient feature of an array of arrays is that, the
first array merely stores the addresses of other arrays. The disadvantage of a multi
dimensional array is the fact that, all the dimensions have to be of the same size.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

Here, we shall see how to create an array a of type [][][]. We first create a local a of type
array of array of array. Thus, we have two levels of indirection. We want the first or main
array to have a size of 5 i.e. it should be able to store the references of 5 arrays in memory.
The instruction ldc places the size 5 of this array on the stack. Thereafter newobj is used to
create the first dimension of this array. The instruction stloc a initializes this array and
ldloc a put its reference on the stack.
Subsequently two values are placed on the stack. One is the index of the first member a[0]
and the other is the size of the array that this member should point to i.e. 3. newobj
creates an array called int32[][]. To store it in a[0] the Set function is used. This function
requires the index of the array as the first parameter. Hence, 0 is placed on the stack, even
though newobj does not require it. It simplifies the call of the Set function.
The next thing required is an int32[] to store in our int32[][]. So, the array a is placed again
on the stack and 0 is used to obtain the value of the array that has just been created. The
Get functions does the job of retrieving values. Then, as before, 1 is placed on the stack
followed by the size of the new array i.e.10. Finally, newobj creates a simple array int32[]
and places it on the stack which is then stored using the Set function.
Remember that the value 1 has already been placed on the stack. To execute the operation
a[0][1][5] = 100 the member a[0] is requred. So, the array reference a is placed on the

stack followed by 0 and the Get function is called.
To access a[0][1], as the first member of array a[0] is already on the stack, all that is
requred is placing 1 on the stack and calling Get again. Now, to store the value in the
member a[0][1][5], 5 is loaded on the stack. To fetch the values of member a[0][1][5], the
same procedure as before is followed. That is
• load the array reference on the stack.
• obtain the member 0 by using get.
• obtain the member 1 of this array
• finally the member 5 on this array.
The logic is the same as described earlier.

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

C# to IL 12 Arrays(数组)

This example builds upon the earlier example, which has a function that accepts a variable
number of arguments. In C#, __arglist enables us to implement a function that accepts a
variable number of arguments.
Internally, in IL, the function is marked with a vararg modifier and, the ArgIterator class is
used to display the values in a loop.