U-Boot 源码中常用的 GNU 汇编命令(Assembler Directives)

时间:2022-06-04 03:09:14

GNU 的汇编器 as 针对有很多架构的处理器.

这里只举例 ARM 架构相关的介绍, 只关注 ELF 文件格式. 

举例的指令中只摘取了 u-boot 汇编程序中出现的或者常用的. 要了解更多的话狠戳下面的链接.

能力时间有限, 为避免翻译后的误解, 原文照copy了. 菜鸟上伤起啊.

参考文档地址: http://sourceware.org/binutils/docs-2.20/as/index.html#Top

===============================================================================================================================

ARM Machine Directives:

.align expression [, expression] This is the generic .align directive. For the ARM however if the first argument is zero (ie no alignment is needed) the assembler will behave as if the argument had been 2 (ie pad to the next four byte boundary). This is for compatability with ARM's own assembler. name .req register name This creates an alias for register name called name. For example:

foo .req r0

.code [16|32] This directive selects the instruction set being generated. The value 16 selects Thumb, with the value 32 selecting ARM. .thumb This performs the same action as .code 16. .arm This performs the same action as .code 32. .force_thumb This directive forces the selection of Thumb instructions, even if the target processor does not support those instructions .thumb_func This directive specifies that the following symbol is the name of a Thumb encoded function. This information is necessary in order to allow the assembler and linker to generate correct code for interworking between Arm and Thumb instructions and should be used even if interworking is not going to be performed. The presence of this directive also implies .thumb .thumb_set This performs the equivalent of a .set directive in that it creates a symbol which is an alias for another symbol (possibly not yet defined). This directive also has the added property in that it marks the aliased symbol as being a thumb function entry point, in the same way that the .thumb_func directive does. .ltorg This directive causes the current contents of the literal pool to be dumped into the current section (which is assumed to be the .text section) at the current location (aligned to a word boundary). .pool This is a synonym for .ltorg.

.global symbol 或 .globl symbol:

.global makes the symbol visible to ld. If you define symbol in your partial program, its value is made available to other partial programs that are linked with it. Otherwise, symbol takes its attributes from a symbol of the same name from another file linked into the same program.

Both spellings (`.globl' and `.global') are accepted, for compatibility with other assemblers.

.word expressions:

This directive expects zero or more expressions, of any section, separated by commas.

The size of the number emitted, and its byte order, depend on what target computer the assembly is for.

Warning: Special Treatment to support Compilers

Machines with a 32-bit address space, but that do less than 32-bit addressing, require the following special treatment. If the machine of interest to you does 32-bit addressing (or doesn't require it; see Machine Dependencies), you can ignore this issue.

In order to assemble compiler output into something that works, as occasionally does strange things to `.word' directives. Directives of the form `.word sym1-sym2' are often emitted by compilers as part of jump tables. Therefore, when as assembles a directive of the form `.word sym1-sym2', and the difference between sym1 and sym2 does not fit in 16 bits, as creates asecondary jump table, immediately before the next label. This secondary jump table is preceded by a short-jump to the first byte after the secondary table. This short-jump prevents the flow of control from accidentally falling into the new table. Inside the table is a long-jump to sym2. The original `.word' contains sym1 minus the address of the long-jump to sym2.

If there were several occurrences of `.word sym1-sym2' before the secondary jump table, all of them are adjusted. If there was a `.word sym3-sym4', that also did not fit in sixteen bits, a long-jump to sym4 is included in the secondary jump table, and the .word directives are adjusted to contain sym3 minus the address of the long-jump to sym4; and so on, for as many entries in the original jump table as necessary.

.balign[wl] abs-exprabs-exprabs-expr

Pad the location counter (in the current subsection) to a particular storage boundary. The first expression (which must be absolute) is the alignment request in bytes. For example `.balign 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

The second expression (also absolute) gives the fill value to be stored in the padding bytes. It (and the comma) may be omitted. If it is omitted, the padding bytes are normally zero. However, on some systems, if the section is marked as containing code and the fill value is omitted, the space is filled with no-op instructions.

The third expression is also absolute, and is also optional. If it is present, it is the maximum number of bytes that should be skipped by this alignment directive. If doing the alignment would require skipping more bytes than the specified maximum, then the alignment is not done at all. You can omit the fill value (the second argument) entirely by simply using two commas after the required alignment; this can be useful if you want the alignment to be filled with no-op instructions when appropriate.

The .balignw and .balignl directives are variants of the .balign directive. The .balignw directive treats the fill pattern as a two byte word value. The .balignl directives treats the fill pattern as a four byte longword value. For example, .balignw 4,0x368d will align to a multiple of 4. If it skips two bytes, they will be filled in with the value 0x368d (the exact placement of the bytes depends upon the endianness of the processor). If it skips 1 or 3 bytes, the fill value is undefined.

.macro:

The commands .macro and .endm allow you to define macros that generate assembly output. For example, this definition specifies a macro sum that puts a sequence of numbers into memory:
             .macro  sum from=0, to=5
.long \from
.if \to-\from
sum "(\from+1)",\to
.endif
.endm
With that definition, `SUM 0,5' is equivalent to this assembly input:
             .long   0
.long 1
.long 2
.long 3
.long 4
.long 5
.macro  macname .macro  macname  macargs  ... Begin the definition of a macro called macname. If your macro definition requires arguments, specify their names after the macro name, separated by commas or spaces. You can qualify the macro argument to indicate whether all invocations must specify a non-blank value (through `:req'), or whether it takes all of the remaining arguments (through `:vararg'). You can supply a default value for any macro argument by following the name with `=deflt'. You cannot define two macros with the same macname unless it has been subject to the .purgem directive (see Purgem) between the two definitions. For example, these are all valid .macro statements:
.macro comm
Begin the definition of a macro called comm, which takes no arguments. 
.macro plus1 p, p1
.macro plus1 p p1
Either statement begins the definition of a macro called plus1, which takes two arguments; within the macro definition, write `\p' or `\p1' to evaluate the arguments. 
.macro reserve_str p1=0 p2
Begin the definition of a macro called reserve_str, with two arguments. The first argument has a default value, but not the second. After the definition is complete, you can call the macro either as `reserve_str a,b' (with `\p1' evaluating to a and `\p2' evaluating to b), or as `reserve_str ,b' (with `\p1' evaluating as the default, in this case `0', and `\p2' evaluating to b). 
.macro m p1:req, p2=0, p3:vararg
Begin the definition of a macro called m, with at least three arguments. The first argument must always have a value specified, but not the second, which instead has a default value. The third formal will get assigned all remaining arguments specified at invocation time.

When you call a macro, you can specify the argument values either by position, or by keyword. For example, `sum 9,17' is equivalent to `sum to=17, from=9'.

Note that since each of the macargs can be an identifier exactly as any other one permitted by the target architecture, there may be occasional problems if the target hand-crafts special meanings to certain characters when they occur in a special position. For example, if the colon (:) is generally permitted to be part of a symbol name, but the architecture specific code special-cases it when occurring as the final character of a symbol (to denote a label), then the macro parameter replacement code will have no way of knowing that and consider the whole construct (including the colon) an identifier, and check only this identifier for being the subject to parameter substitution. So for example this macro definition:

          	.macro label l
\l:
.endm

might not work as expected. Invoking `label foo' might not create a label called `foo' but instead just insert the text `\l:' into the assembler source, probably generating an error about an unrecognised identifier.

Similarly problems might occur with the period character (`.') which is often allowed inside opcode names (and hence identifier names). So for example constructing a macro to build an opcode from a base name and a length specifier like this:

          	.macro opcode base length
\base.\length
.endm

and invoking it as `opcode store l' will not create a `store.l' instruction but instead generate some kind of error as the assembler tries to interpret the text `\base.\length'.

There are several possible ways around this problem:

Insert white space
If it is possible to use white space characters then this is the simplest solution. eg:
               	.macro label l               \l :               	.endm
Use `\()' The string `\()' can be used to separate the end of a macro argument from the following text. eg:
               	.macro opcode base length                       \base\().\length               	.endm
Use the alternate macro syntax mode In the alternative macro syntax mode the ampersand character (`&') can be used as a separator. eg:
               	.altmacro               	.macro label l               l&:               	.endm

Note: this problem of correctly identifying string parameters to pseudo ops also applies to the identifiers used in .irp (see Irp) and .irpc (see Irpc) as well. 

.endm Mark the end of a macro definition. 
.exitm Exit early from the current macro definition. \@ as maintains a counter of how many macros it has executed in this pseudo-variable; you can copy that number to your output with `\@', but only within a macro definition
LOCAL name [ , ... ] Warning: LOCAL is only available if you select “alternate macro syntax” with `--alternate' or .altmacro. See .altmacro.
.align  abs-exprabs-exprabs-expr:

Pad the location counter (in the current subsection) to a particular storage boundary. The first expression (which must be absolute) is the alignment required, as described below.

The second expression (also absolute) gives the fill value to be stored in the padding bytes. It (and the comma) may be omitted. If it is omitted, the padding bytes are normally zero. However, on some systems, if the section is marked as containing code and the fill value is omitted, the space is filled with no-op instructions.

The third expression is also absolute, and is also optional. If it is present, it is the maximum number of bytes that should be skipped by this alignment directive. If doing the alignment would require skipping more bytes than the specified maximum, then the alignment is not done at all. You can omit the fill value (the second argument) entirely by simply using two commas after the required alignment; this can be useful if you want the alignment to be filled with no-op instructions when appropriate.

The way the required alignment is specified varies from system to system. For the arc, hppa, i386 using ELF, i860, iq2000, m68k, or32, s390, sparc, tic4x, tic80 and xtensa, the first expression is the alignment request in bytes. For example `.align 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed. For the tic54x, the first expression is the alignment request in words.

For other systems, including ppc, i386 using a.out format, arm and strongarm, it is the number of low-order zero bits the location counter must have after advancement. For example `.align 3' advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.

This inconsistency is due to the different behaviors of the various native assemblers for these systems which GAS must emulate. GAS also provides .balign and .p2align directives, described later, which have a consistent behavior across all architectures (but are specific to GAS).

.section name:

Use the .section directive to assemble the following code into a section named name.This directive is only supported for targets that actually support arbitrarily named sections; on a.out targets, for example, it is not accepted, even with a standard a.out section name.

ELF Version

This is one of the ELF section stack manipulation directives. The others are .subsection (see SubSection), .pushsection (see PushSection), .popsection (see PopSection), and.previous (see Previous).

For ELF targets, the .section directive is used like this:

     .section name [, "flags"[, @type[,flag_specific_arguments]]]

The optional flags argument is a quoted string which may contain any combination of the following characters:

a
section is allocatable 
w
section is writable 
x
section is executable 
M
section is mergeable 
S
section contains zero terminated strings 
G
section is a member of a section group 
T
section is used for thread-local-storage

The optional type argument may contain one of the following constants:

@progbits
section contains data 
@nobits
section does not contain data (i.e., section only occupies space) 
@note
section contains data which is used by things other than the program 
@init_array
section contains an array of pointers to init functions 
@fini_array
section contains an array of pointers to finish functions 
@preinit_array
section contains an array of pointers to pre-init functions

Many targets only support the first three section types.

Note on targets where the @ character is the start of a comment (eg ARM) then another character is used instead. For example the ARM port uses the % character.

If flags contains the M symbol then the type argument must be specified as well as an extra argument—entsize—like this:

     .section name , "flags"M, @type, entsize

Sections with the M flag but not S flag must contain fixed size constants, each entsize octets long. Sections with both M and S must contain zero terminated strings where each character isentsize bytes long. The linker may remove duplicates within sections with the same name, same entity size and same flags. entsize must be an absolute expression. For sections with both M and S, a string which is a suffix of a larger string is considered a duplicate. Thus "def" will be merged with "abcdef"; A reference to the first "def" will be changed to a reference to "abcdef"+3.

If flags contains the G symbol then the type argument must be present along with an additional field like this:

     .section name , "flags"G, @type, GroupName[, linkage]

The GroupName field specifies the name of the section group to which this particular section belongs. The optional linkage field can contain:

comdat
indicates that only one copy of this section should be retained 
.gnu.linkonce
an alias for comdat

Note: if both the M and G flags are present then the fields for the Merge flag should come first, like this:

     .section name , "flags"MG, @type, entsize, GroupName[, linkage]

If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to have none of the above flags: it will not be allocated in memory, nor writable, nor executable. The section will contain data.

For ELF targets, the assembler supports another type of .section directive for compatibility with the Solaris assembler:

     .section "name"[, flags...]

Note that the section name is quoted. There may be a sequence of comma separated flags:

#alloc
section is allocatable 
#write
section is writable 
#execinstr
section is executable 
#tls
section is used for thread local storage

This directive replaces the current section and subsection. See the contents of the gas testsuite directory gas/testsuite/gas/elf for some examples of how this directive and the other section stack directives work.

.type:

This directive is used to set the type of a symbol.

ELF Version

For ELF targets, the .type directive is used like this:

     .type name , type description

This sets the type of symbol name to be either a function symbol or an object symbol. There are five different syntaxes supported for the type description field, in order to provide compatibility with various other assemblers.

Because some of the characters used in these syntaxes (such as `@' and `#') are comment characters for some architectures, some of the syntaxes below do not work on all architectures. The first variant will be accepted by the GNU assembler on all architectures so that variant should be used for maximum portability, if you do not need to assemble your code with other assemblers.

The syntaxes supported are:

       .type <name> STT_<TYPE_IN_UPPER_CASE>
.type <name>,#<type>
.type <name>,@<type>
.type <name>,%<type>
.type <name>,"<type>"

The types supported are:

STT_FUNC
function
Mark the symbol as being a function name. 
STT_GNU_IFUNC
gnu_indirect_function
Mark the symbol as an indirect function when evaluated during reloc processing. (This is only supported on Linux targeted assemblers). 
STT_OBJECT
object
Mark the symbol as being a data object. 
STT_TLS
tls_object
Mark the symbol as being a thead-local data object. 
STT_COMMON
common
Mark the symbol as being a common data object. 
STT_NOTYPE
notype
Does not mark the symbol in any way. It is supported just for completeness.  
gnu_unique_object
Marks the symbol as being a globally unique data object. The dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use. (This is only supported on Linux targeted assemblers).

Note: Some targets support extra types in addition to those listed above.

.text subsection:

Tells as to assemble the following statements onto the end of the text subsection numbered subsection, which is an absolute expression. If subsection is omitted, subsection number zero is used.

.set symbolexpression:

Set the value of symbol to expression. This changes symbol's value and type to conform to expression. If symbol was flagged as external, it remains flagged (see Symbol Attributes).

You may .set a symbol many times in the same assembly.

If you .set a global symbol, the value stored in the object file is the last value stored into it.

The syntax for set on the HPPA is `symbol .set expression'.

On Z80 set is a real instruction, use `symbol defl expression' instead.

.rept count:

Repeat the sequence of lines between the .rept directive and the next .endr directive count times.

For example, assembling

             .rept   3
.long 0
.endr

is equivalent to assembling

             .long   0
.long 0
.long 0

.byte expressions:

.byte expects zero or more expressions, separated by commas. Each expression is assembled into the next byte.

.int expressions:

Expect zero or more expressions, of any section, separated by commas. For each expression, emit a number that, at run time, is the value of that expression. The byte order and bit size of the number depends on what kind of target the assembly is for.

.short expressions:

.short is normally the same as `.word'. See .word.

In some configurations, however, .short and .word generate numbers of different lengths. See Machine Dependencies.

.hword expressions:

This expects zero or more expressions, and emits a 16 bit number for each.

This directive is a synonym for `.short'; depending on the target architecture, it may also be a synonym for `.word'.

.long expressions:

.long is the same as `.int'. See .int.

.org new-lc , fill:

Advance the location counter of the current section to new-lcnew-lc is either an absolute expression or an expression with the same section as the current subsection. That is, you can't use .org to cross sections: if new-lc has the wrong section, the .org directive is ignored. To be compatible with former assemblers, if the section of new-lc is absolute, as issues a warning, then pretends the section of new-lc is the same as the current subsection.

.org may only increase the location counter, or leave it unchanged; you cannot use .org to move the location counter backwards.

Because as tries to assemble programs in one pass, new-lc may not be undefined. If you really detest this restriction we eagerly await a chance to share your improved assembler.

Beware that the origin is relative to the start of the section, not to the start of the subsection. This is compatible with other people's assemblers.

When the location counter (of the current subsection) is advanced, the intervening bytes are filled with fill which should be an absolute expression. If the comma and fill are omitted, filldefaults to zero.

.extern:

.extern is accepted in the source program—for compatibility with other assemblers—but it is ignored. as treats all undefined symbols as external.

.size:

This directive is used to set the size associated with a symbol.

ELF Version

For ELF targets, the .size directive is used like this:

     .size name , expression

This directive sets the size associated with a symbol name. The size in bytes is computed from expression which can make use of label arithmetic. This directive is typically used to set the size of function symbols.

.hidden names:

This is one of the ELF visibility directives. The other two are .internal (see .internal) and .protected (see .protected).

This directive overrides the named symbols default visibility (which is set by their binding: local, global or weak). The directive sets the visibility to hidden which means that the symbols are not visible to other components. Such symbols are always considered to be protected as well.

.equ symbolexpression:

This directive sets the value of symbol to expression. It is synonymous with `.set'; see .set.

The syntax for equ on the HPPA is `symbol .equ expression'.

The syntax for equ on the Z80 is `symbol equ expression'. On the Z80 it is an eror if symbol is already defined, but the symbol is not protected from later redefinition. Compare Equiv.

.data subsection:

.data tells as to assemble the following statements onto the end of the data subsection numbered subsection (which is an absolute expression). If subsection is omitted, it defaults to zero.

.include"file":

This directive provides a way to include supporting files at specified points in your source program. The code from file is assembled as if it followed the point of the .include; when the end of the included file is reached, assembly of the original file continues. You can control the search paths used with the `-I' command-line option (see Command-Line Options). Quotation marks are required around file.

.error"string":

Similarly to .err, this directive emits an error, but you can specify a string that will be emitted as the error message. If you don't specify the message, it defaults to ".error directive invoked in source file". See Error and Warning Messages.

      .error "This code has not been assembled and tested."

.local names:

This directive, which is available for ELF targets, marks each symbol in the comma-separated list of names as a local symbol so that it will not be externally visible. If the symbols do not already exist, they will be created.

For targets where the .lcomm directive (see Lcomm) does not accept an alignment argument, which is the case for most ELF targets, the .local directive can be used in combination with.comm (see Comm) to define aligned local common data.

.struct expression:

Switch to the absolute section, and set the section offset to expression, which must be an absolute expression. You might use this as follows:

             .struct 0
field1:
.struct field1 + 4
field2:
.struct field2 + 4
field3:

This would define the symbol field1 to have the value 0, the symbol field2 to have the value 4, and the symbol field3 to have the value 8. Assembly would be left in the absolute section, and you would need to use a .section directive of some sort to change to some other section before further assembly.