The code of a data type is implemented by a method, which is executed by the Execution
Engine. The CLR offers a large number of services to support the execution of code.
Any code that uses these services is called managed code. Managed code allows the CLR to
provide a set of features such as handling exceptions. It also makes sure that the code is
verifiable. Only managed code has access to managed data.
There is no rule in the IL book that prevents a method from being global. It can certainly be
written outside a class.
In fact we can write the smallest IL program without using the class directive. It is
mandatory to have a function with the entrypoint directive. Thus, had the designers of C#
so desired, they could have provided the facility of global functions, but they chose not to.
They decided, in their infinite wisdom, that all functions should be placed within a class.
There is no such restriction imposed by IL.
The CLR recognizes three types of methods: static, instance and virtual. There are some
special functions that are automatically called by the runtime such as static constructors
or type initializers such as .cctor and instance constructors such as .ctor.
A method in IL is uniquely identified by its signature. A signature consists of five parts:
• The name of the method
• The type or class that the method resides in
• The calling convention used
• The return type
• The parameter types.
For people like us, who are familiar with the world of C, C++ and Java, the concept of a
message signature depending upon the return type of a function is alien.
Here, we have two functions, both named a2, which differ in the type of return value. This
is perfectly valid in IL. The reason being that when calling a method in IL, we only have to
state the return type. But what is allowed in IL, may be taboo in C#.
Method overloading is a concept where the same function name appears in a class, more
than once. In fact, you may not have clearly observed, in the above programs, the this
pointer is not passed to the global functions. Even then, things worked well.
The reason for this is that generally, global functions are static by default. In fact, static
functions are found in classes, value types and interfaces. Static functions always have a
body associated with them.
The second type of method very commonly used is an instance. These are functions
associated with an instance of a class. In this version of the CLR, we cannot declare them
in interfaces. Unlike static methods which are stand-alone methods and behave like global
functions, an instance functions is always passed a pointer or reference to the data
associated with the object. Thus, it can use the this pointer to access a different set of data
each time.
A runtime exception is thrown cause the call expects the method to be static, whereas, our
method is an instance. To avoid this runtime error, replace the modifier instance with
static.
The this pointer is of the same type as the class in which the method resides. We therefore,
have to create an instance of a class before we can execute any instance method from the
class.
As a rule, all instance functions must have the this pointer as the first parameter.
Therefore, it is automatically added as a first hidden parameter. The this pointer can be a
null reference too.
Whenever we refer to a field in a type, through a function, the this pointer should first be
available on the stack. This facilitates access to the instance fields. This explains the above
error.
Here, we have placed a ldnull as the this pointer, and thus, are unable to access the
instance members. On commenting the ldnull, no error is generated.
The instruction newobj places a this pointer on the stack. Therefore, prior to using it,
ldarg.0 is checked for NULL. However, for a value type, the this pointer is a managed
pointer to the value type. Unlike static or virtual, an instance is not an attribute of a
method. It is part of the calling convention of a method.
There are three ways to call a method in IL. These are: call, callvirt and calli. Two of these,
call and callvirt, have already been dealt with, in the past.
There are three other instructions that can be used to call a method in a special way.
These are jmp, jmpi and newobj. Every method that we call has its own evaluation stack.
The parameters to the function are placed on this stack, and instructions also obtain their
arguments from the same stack.
On the execution of an instruction, the result is also placed on the same stack. The
runtime creates and maintains this stack. When the method quits out, the stack is
released.
There is another stack that we do not concern ourselves with. This stack keeps track of the
method being called, and hence, is known as the call stack.
The last and final instruction in any function is the ret instruction. This instruction is
responsible for the method returning control back to the calling method. If a function
returns a value, it must be placed on the stack before ret is called. When quitting off a
method, the stack must not contain any value, other than the value to be returned.
We use the call instruction to call static or virtual functions. Before the call instruction, all
the parameters to the method must be placed on the stack. The first argument to the
function is placed first. The only difference between calling a static and an instance
method is that, the modifier instance is used for an instance method whereas, no modifier
is required for a static method
Virtual functions have to be handled with care as they are runtime entities. With virtual
functions, the instruction callvirt is used in place of call. callvirt unlike call executes the
overriding version of the method.
We have pulled out this program from an earlier chapter, where we explained new, override
and virtual functions. The callvirt function calls the function abc from xxx, as it overrides
the one from the class yyy.
The reason being, in the class xxx, there is no modifier newslot for the function abc, hence
it is a different abc from the one in the base class. With call however, the instruction
simply calls abc from the class specified, as it does not understand modifiers like virtual,
newslot etc. instance is used with callvirt as the this pointer, under no circumstances, can
be NULL.
In the above example, the super class function abc from the class yyy is called, from the
function abc from class xxx. This facilitates reusing code defined in the super class.
A virtual function may want to call all code in the base class. In IL parlance, it is termed as
a super call. In the above code, we foresee a problem with callvirt as it will either call itself
over and over again, or give us the following exception:
The reason for the above error is that, the this pointer refers to class xxx and not of the
class yyy. Thus, the instruction call is used and not callvirt.
We have created an object like zzz using newobj. It places a reference to a zzz on the stack.
The this pointer then calls the instance function abc.
Here we have displayed "hi" and then an instance method pqr is called using the jmp
instruction.
After the method pqr finishes execution, control does not regress to method abc. Instead,
control returns back to vijay, which is the method that called abc. Thus the string "bye"
present in the method pqr, does not get displayed.
The jmp instruction does not revert the control back to the method from where the
program initially branched out.
The above program is similar to its predecessor, but it uses the instruction jmpi instead of
jmp. This instruction is similar to jmp, but differs in the following aspects:
• In the case of the jmp instruction, we placed the method signature on the stack as a
parameter to the instruction.
• In the case of the jmpi instruction, we first use the instruction ldftn to load the
address of the function pqr on the stack, and then call jmpi.
The jmp family of instructions executes a jump or a branch across a method. We can only
jump to the beginning of a method, and not to anywhere inside it. The signature of the
method that we intend to jump to, must be the same.
If the signature of the method being jumped to is not the same, the above exception is
thrown. The jmp instruction is not verifiable.
The method abc take two ints as parameters. We have placed the constant 3 on the stack,
and then used the instruction starg to change the parameter j. Then, ldarg is used to place
the new value on the stack. Thereafter, we have called the WriteLine function to confirm if
the new value is 3. The jmp instruction is the next to be called.
Here we have not placed any parameters on the stack. The jmp instruction first places the
numbers 1 and 2 on the stack, and then, calls the function pqr, that simply displays the
parameters that have been passed.
Even though we have changed the parameter j, the change is not reflected in the called
function pqr. This is contrary to what the documentation states. The call does not pass
parameters to the next method. The instruction jmp does so.
If function pqr returns a value, it will be passed to the function vijay and not to abc. We
cannot place any values on the stack before executing the jump. Jumps can be executed
only between methods that have the same signatures.
We can call a method indirectly by first, placing its address on the stack, and then, using
the calli instruction. At first, the instruction ldftn places the address of a non-virtual
function on the stack. Like in the case of instance functions, the this pointer has to be
placed first on the stack, followed by the parameters to the functions. When we tried using
calli with the address of a virtual function, Windows generated an error.
We use the newobj instruction to create a new instance, and also, call the constructor of a
class, which is nothing more than a special instance method.
The only difference between a constructor and an instance call is that, the this pointer is
not passed to the constructor. newobj first creates the object, and then, automatically
places the this pointer on the stack
The newobj instruction places the this pointer on the stack before calling the constructor.
If we desire to call the constructor ourselves, we too need to place the this pointer on the
stack.
In the above program, we have changed the value of the field i to 1, then again changed it
to 2 using stfld and then displayed this value. Thereafter, we have called the constructor,
which changes the value back to 1 again. This proves that a constructor is no different
from any other function.
A method definition is called a method head in IL. The head also functions as an interface
to other methods. The format of the head is as follows:
• It starts with a number of predefined method attributes.
• These are followed by an optional indication, specifying whether the method is an
instance method or not.
• Thereafter, the calling convention is specified.
• This is followed by the return type and a few more optional parameters.
• Finally, we state the name and the parameters to the method and the implementation
attributes.
Methods are instance by default. To change the default behavior, we use use the modifiers
static or virtual. As of today, the return type cannot have any attributes, but who knows,
what changes may take place tomorrow.
The code for the method is written in the method body. It can incorporate a large number
of directives.
The code that we write, gets converted into numbers. Every IL instruction is represented by
a number. The ldc.i4.3 instruction is known by the number 19 hex. This information is
available in the Instruction Set Reference. The directive emitbyte emits an unsigned 8 bit
number directly into the code section of the method.
Thus, we can use the opcodes of an IL instruction directly in il programs.
The return value of the entrypoint function can either be void, int32 or unsigned int32.
This value is handed over to the Operating System. A value of ZERO normally indicates
success and any other value indicates an error. The entrypoint method is unique, meaning,
it can have private accessibility, and yet be accessed by the runtime.
The .locals directive is used to create a local variable that can only be accessed from within
that method. Thus, it is used to store data that exists only for the duration of a method
call. After a method quits, all the memory allocated for a local is reclaimed by IL.
It is faster for the system to allocate memory on the stack, where locals get stored, than to
allocate memory on the heap for the fields. We cannot specify attributes for local variables,
like we do for parameters.
The .locals directive can be placed at the end of the code and does not have to be placed at
the beginning. Thus, in a sense, a forward reference is allowed here.
Remove the comments and a value of zero will be displayed.
There is some overlap in IL. If we use the modifier init in the locals directive, then all the
variables will be assigned their default values, depending upon their type. We have touched
upon this point earlier.
The same effect is seen when we use the directive .zeroinit. This applies to all the locals in
the method.
• If we place the comments, the variable i will be assigned whatever value is present on
the stack.
• If we remove the comments, the runtime initialises all the value types to ZERO and
all the reference types to NULL.
Some of the directives can only be used within certain entities. The directive .zeroinit can
only be used within a method and not outside. The assembler checks whether the directive
has been used at the right place or not. If not, it generates an error message that is hardly
informative.
You may accuse us of being repetitive, but there is no harm in refreshing our memory.
Class yyy is a base class and xxx the derived class. We have created a local of type yyy,
which is the base class, but initialized it to the class xxx, which is the derived class. A
better way to say it is, we are creating an object that looks like xxx, but storing it in a yyy
local.
callvirt calls the function abc from the class xxx despite of it being called from the yyy
class, . This is because, the instruction callvirt executes at runtime. In that environment,
the this pointer on the stack is of class xxx, and thus abc from the class xxx is called. The
virtual function has its own unique way of deciding on the pointer to be placed on the
stack.
If we remove the modifier virtual from the function abc in class xxx, then the function abc
will be called from the yyy class. Changing the newobj to yyy does not make a difference, as
both the run time and compile time data types should be the same. The run time data type
takes precedence over the compile time data type.
We add the modifier newslot in function abc class xxx as follows:
Here, from the point of view of the run time, the function abc is treated as a new function.
As there is no connection with the abc of class yyy, they are now treated as two distinct
functions. The abc of class yyy is called. Placing the modifier newslot in class yyy function
for abc makes it a new function abc, if one is present in the object. Thus, it makes no
difference here.
The above program is pretty large. The only difference between this program and its
predecessor is that, we have added one more class www derived from xxx. We have created
two locals, one each of the types xxx and yyy, but the run time data type of both the locals
is a www object.
The functions abc are virtual throughout. When we call the functions abc though callvirt,
even though we are using the class prefix xxx and yyy, the function gets called from www.
This is so because the run time data type, i.e. www, of the this pointer has been passed.
Then, we make our first small change: We add a newslot to the function abc in class www.
The output now reads as follows:
This output has resulted as shown above because, newslot dissociates the function abc of
the class www, from the earlier abc functions. Thus, since the abc of class xxx is the
newest, it gets called.
Next, we add the modifier newslot to the function abc from class xxx and remove it from
the class www. The output now reads as.
Isn't the output fascinating? Now you probably can understand, as to why we are revisiting
virtual functions.
By adding the modifier newslot to the function abc in class xxx, we are creating two
families of abc:
• One that comprises only of a single abc in class yyy
• Another made up of abc functions from classes xxx and www.
Thus, in every instance, the last member of the family gets called and, since the first family
has only one member, this single member i.e. class yyy, gets called.
In the second case, the abc of class www gets called. Now let us add the newslot modifier to
function abc class www, without removing the one from class xxx.
The output now reads as follows:
Now, we have three families of abc functions. Each of them has only one function abc that
has nothing to do with the abc functions of the other families.
If we add the modifier newslot to the function abc in class yyy, we will not see any change
in the output. This is because, we are cutting off abc from its root, from class yyy onwards.
There is no function abc in any of the classes that yyy derives from. Hence, there is no
change in the output.
If we remove virtual from the function abc in class www, it has the same effect as adding
the modifier newslot. A virtual modifier function signifies that the address of the function
to be called should be read from the vtable. If we remove the virtual modifier from function
abc class xxx, the output will be as follows:
This output has resulted because of the following:
The object created is a www type.
• In the first case, the vtable has the address of a www abc. The vtable stores a single
address of every virtual function. The runtime checks for the compile time data type of
the pointer and on examining, it looks like yyy. Within yyy, it discovers that function
abc is virtual. Thus it looks into the vtable for the address which turns out to be that of
www.
• In the second case, at the compile time the type revealed is xxx. But within the class
xxx, the function is not virtual and thus, the vtable does not come into play.
Now we remove virtual from the function abc of class yyy only. Remember, we are making
only one change a time. The output now will be as follows:
The same explanation as given earlier applies here too. We hope you will remember us and
our brilliant explanation of the concept of virtual. At least, this is how we interpret it, and
do not mind being the only ones to do so in this manner
In IL, the scoping levels do not exhibit similar behavior to those found in traditional
languages like C. Here i is created as a new variable each time with the { brace even
though, all the variables are moulded together into one large local directive.
Thus we refer to the individual variables i in their respective blocks. The ldloc.0 stands for
the first i whereas, ldloc.2 stands for the inner i that is visible in the outer brace
The above program displays different values for the local variable i. The output proves that
they are created consecutively in memory.
Whenever you are in doubt, display the value of the variables and clear up the cobwebs in
your mind. Thus, scope blocks are also known as syntactic sugar and are only used to
increase the readability and to debug code written by others.
Internally, for a variable name, IL begins at the scope we are presently in, and recursively
tries to resolve the name of the variable. Thus, even though a declaration hides the name of
a variable, we can access it using the index. The scope does not change the lifetime of a
variable. All the variables in a method are created when we first enter the method, and die
when we exit from it. The variable is always accessible by the zero based index, that is
allocated on a "first come first served" basis.
The above program demonstrates how a function accepts multiple number of parameters.
Vararg is a calling convention that allows passing of multiple parameters to a function. We
have created a variable called it, that looks like System.ArgIterator. We have then loaded
its address on the stack using ldloca and then called arglist. This instruction returns an
opaque handle i.e. an unmanaged pointer which represents all the arguments passed to
the method. This handle can be passed to other methods but is valid only during the
lifetime of the current method. This opaque handle is of the type RuntimeArgumentHandle.
The arglist instruction is valid on methods that take a variable number of arguments. The
constructor of the value class ArgIterator is called with this handle as a parameter.
Once the value class is instantiated, we place the address of a local variable x on the stack.
This is more to store the parameter passed to our function. Subsequenly, the address of
variable it is put on the stack too. A function GetNextArg from class ArgIterator is called
that places a typedref on the stack, which is then passed to the function ToObject.
Then, the class to an int32 is casted and unboxed as we need a value type. This value is
copied to the variable x. The vararg is a calling convention, and thus, part of the signature
of the method. We are specifying it as part of the call instruction. The ellipsis denote the
end of fixed parameters and beginning of the variable number of parameters. This is
because, a function may want to have a certain fixed number of parameters also.
The other functions of the class ArgIterator can also give us useful information, such as
the number of items on the stack.
We use method parameters to enable a method to accept data from the caller. Method
parameters are checked for type safety. They make it mandatory for a method to be called
with the correct parameters. The Execution Engine enforces the contract between the caller
and the called methods.
We are not compelled to assign any name to the parameters. In the above program, we
have a local as well as a parameter of type int32 which has no name or id. IL does not
seem to care at all. However, the unnamed variables can be referenced only as an index.
Parameters can also have attributes, as we shall now see, but these attributes have
nothing to do with the signature.
The first attribute to a parameter is opt, which makes it optional. This means that, it is not
compulsory(义务的) to pass a parameter to our function.
Always read the fine print. The opt attribute may indicate that the parameter is optional,
but it is used for documentation purposes only. The compiler may place the opt attribute
on a parameter, so that other tools make sense of it. As far as the runtime is concerned,
however, all the parameters are mandatory, and it simply ignores the opt attribute. Thus,
opt has no significance for the runtime.
Implementation attributes provide a lot of information about the nature of the method to
the runtime. These attributes decide whether the method requires special handling at
runtime or not.
You should run the above program with and without the synchronized attribute to
appreciate its significance.
The attribute il managed tells the runtime that the method contains IL code that will run in
the managed world. We have created two threads, V_1 and V_2. These execute the same
function abc from class yyy.
In the function abc, we display numbers from 0 to 3, using a loop. After displaying a
number, the Sleep function stalls all operations for 1000 milliseconds. Thus the first
thread executes function abc, prints the value 0 and then sleeps. Now the second thread
takes advantage of the fact that the first thread is sleeping, and it also displays 0 and falls
asleep. This continues till we reach the value 3 and exit from the loop.
The synchronized attribute does not execute the second function until the first thread
terminates. Thus, the second thread has no choice but to wait until the first thread
finishes execution. Try implementing the above in C#.
What we are trying to say is that if C# does not inculcate a feature of IL, there is no way
you can use it in any .cs program.
If a code implementation attribute is not given, the default value is il managed. The other
three options are native, optil and runtime. These are mutually exclusive. The runtime
attribute specifies that the implementation of the code will be supplied by the runtime, and
not by the programmer. We cannot place any code in this type of a method. It is used for
constructors and delegates.
On running the ‘a.exe’ executable, three message boxes pop up with the following message
The program reported the above errors on the introduction of the new attribute optil. It
clearly says that it could not find a particular dll. The attribute optil means that the code is
an optimized IL code that runs faster.
We normally end all our attributes for a method with the qualifier managed or unmanaged.
The default value is managed. This signifies as to who will manage the execution of the
method.
• Managed signifies that the CLR will manage it.
• Unmanaged signifies that someone else will manage it.
If we use the unmanaged attribute with pure IL code we get the above exception.
There are over a trillion lines of code already written in the programming language C,
under the Windows Operating System. This code resides in files called dll's or Dynamic
Link Libraries. To ensure that this code is also be available to programs written in IL, C#
provides an attribute called DllImport.
To be technically accurate, code written in a dll has nothing to do with a programming
language. Once we obtain a dll, there is no way one can detect as to which programming
language it was originally written in. The C# compiler converts our attribute DllImport to a
method. This implies that C# understands attributes and depending upon the attribute it
generates relevant IL code. The method is called MessageBoxA and has the same
parameters that we specified in C#. The added attribute is pinvokeimpl, that is first passed
the name of the dll that contains the function.
Then we have a calling convention that has three parameters. The parameters are pushed
on the stack before the function gets called. The order of placing parameters on the stack
that IL follows is "first written first placed" i.e. from left to right. The winapi calling
convention follows the reverse order i.e. right to left.
Then, the name of the function gets added with a number specifying the size of the
parameters on the stack. Finally who restores the stack, the caller or the callee?
The function MessageBoxA can be called in the same manner that any other static function
of IL gets called.
There are two primary ways of calling unmanaged methods :
• One is using pinvokeimpl,
• The other is using IJW (It Just Works).
In IJW, the runtime stays out of our way, and we have to write code for handling
everything. We stick to pinvokeimpl, the one we can work with. The runtime will
automatically drift us from managed to unmanaged code, convert data types and handle all
the issues of transition management. The attributes to be used are native and unmanaged
as, that is what the documentation recommends. The C# compiler however, is not familiar
with the documentation.
The above example uses recursion to find out the factorial of a number. It uses the prefix
tail. wich is a tail call instruction. Functional programming languages like Lisp or Prolog
use tail calls extensively. In a non-tail call, the current stack frame is kept intact, and a
new frame is allocated. This means that the stack position changes. In a tail call, the stack
frame is replaced with a frame for the function to be called.
When a call terminates with a ret, the control returns to the caller function. In the case of
tail calls, control continues to remain with the called method. Since non-tail calls need to
store information as to who the caller is, it uses up memory on the stack, and may limit
the amount of recursion that is possible. Thus, tail calls handle recursion more effectively
than non-tail calls.
The above program works even without the tail prefix.