Text preview for : 610E00290_Assembler_Reference_Manual_Dec86.pdf part of xerox 610E00290 Assembler Reference Manual Dec86 xerox sdd xde 610E00290_Assembler_Reference_Manual_Dec86.pdf



Back to : 610E00290_Assembler_Refer | Home

ASSEMBLER REFERENCE MANUAL




XEROX




610E00290
December. 1986
Xerox Corporation
Information Systems Division
XDE Technical Services
475 Oakmead Parkway
Sunnyvale. CA 94086




Copyright (0 1986, Xerox Corporation. All rights reserved.
XEROX 0),8010, and 860 are trademarks of XEROX CORPORATION
Printed in U.S. A.
TABLE OF CONTENTS




1. Introduction 1-1
1.1 Introduction 1-1
2. General Rules 2-1
2.1 Sops and pseudoops 2-1
2.2 Files 2-1
2.3 Imports 2-1
2.4 Exports 2-2
2.5 Stack 2-2
2.6 Frames 2-2
2.7 Block structure and scoping 2-2
2.8 Addressing 2-3
2.9 Code constants 2-4
2.10 Comments 2-4
2.11 Identifiers 2-4
2.12 Labels 2-4
2.13 Constants 2-5
2.14 Expressions 2-5
3. Grammar 3-1
4. Pseudooes 4-1
5. Programming Hints 5-1
5.1 Static links 5-1
S.2 Parameter passing 5-2
6. Sops 6-1




ASSEMBLER REfeRENCE MANUAL
TABLE OF CONTENTS




(This page intentionally blank.)




ii ASSEMBLER REFERENCE MANUAL
1. Network components




The Mesa assembler is a program that takes as input a file
containing assembly code and outputs an object file that may
be loaded directly by the Mesa loader or bound into a larger
object file by a linker. Object files all have the same format.
This allows object files created from different source languages
to be bound together. Programs in one language can thus
easily access global variables and procedures written in another
language. The assembler provides protection so that the rules
of object-oriented programming may be enforced. Programs
written in other languages are expected to make frequent use
of the extensive libraries written in Mesa (the window system,
for instance).

The major goal of the assembler is to be independent of any
high-level language. A portable compiler for any language
should find the instruction set of the assembler complete
enough that porting the compiler is simple. The assembler is
intended to be used in the porting of the Berkeley C and Pascal
portable compilers to the Xerox Development Environment
that runs on the Xerox 8010. If the machine architecture
changes, then changes to the ported compilers should be
limited to the assembler.

The assembler should be easy for compiler writers to use. That
is, the assembly language should be readable and should hide
some of the peculiarities of the Mesa machine architecture. The
Mesa architecture is described in detail in the Mesa Processor
Principles of Operation (PrincOps). The assembler instruction
set is a simplified version of the PrincOps instruction set. The
assembler is, to some extent, machine independent in that it
produces code that can run on any PrincOps machine.

The assembler also eases the compiler writer's task by doing a
significant amount of optimization on the assembly program
so the compiler may produce very naive code,and the
performance of the final code will be reasonably good. The
assembler performs peephole optimization, crossjumping,
unreachable code elimination, and other optimizations .. No
global program optimizations are performed.

The reader is assumed to be intimately familiar with the
PrincOps. PrincOps code for specific programs may be
examined by running the Lister on an object file.




ASSEMBLER REFERENCE MANUAL 1-1
INTRODUCTION




(This page intentionally blank.)




1-2 ASSEMBLER REFERENCE MANUAL
2. General Rules




The object file that the assembler produces is called a bcd file.
The format of bcds rarely changes and is guaranteed to be the
same for every bcd in a release. The bcd contains enough
information so that the linker and the loader can do their job.
The linker's job is resolving references to imports and exports
of the modules being linked and combining the code segments
of the modules into a single file. The loader's job is resolving
any imports and exports of the bcd with any bcds previously
loaded and allocating memory space for global frames and
code. This section gives some general rules for writing assembly
code that creates a valid bcd.




2.1 Sops and pseudoops

The assembler instruction set consists of two types of
instructions, sops and pseudoops. Sops are converted to one or
more PrincOps instructions (PrincOps instructions are known as
"mopcodes"). Sops are instructions like "pop" or "add".
Pseudoops contain information needed by the linker and
loader as well as information needed to build a symbol table.




2.2 Files

Filenames may be referred to by just their textual name or by
their unique id (uid). A uid is a name and version stamp. It is
expected that for C and Pascal, version stamps will not be
necessary, except where they access Mesa interfaces. In that
case the Mesa interfaces 00 not need to be opened to obtain
their stamps. The linker uses version stamps to guarantee
consistency across interfaces. It is up to the compiler to choose
whether it wants version checking or not. Consistency of
version stamps within an assembly program is not checked. If
two uids with the same name but different stamps appear in an
assembly program, the first stamp seen is used.




2.3 Imports

Imports allow a module to access variables, procedures, and
constants in other modules. The import pseudoop is a
convenience for specifying a version stamp for a particular
imported file and is only needed to import from Mesa
interfaces. Imported items from Mesa interfaces may be



ASSEMBLER REFERENCE MANUAL 2-1
GENERAL RULES


referred to as "Interface.ltem". Imported items from non-Mesa
modules may be referred to as "?.ltem". If a Ifc, read, or write
instruction uses a name that is undefined, then the name is
assumed to be imported and a link is generated. Constants are
imported just as variables are, but they are stuffed directly into
the code and no links are generated for them. Symbols for
imported items are always copied to the new object file.




2.4 Exports

Modules may make procedures or variables in their outer scope
available for import by other modules by exporting them.
Many languages do not have the notion of explicit exports.
Such languages need only put an export all pseudoop in
assembly programs, which automatically exports every
procedure and variable in the global scope. To export to a
Mesa interface, the pseudoops exportvar and exportproc are
available for selectively exporting particular variables or
procedures. If those pseudoops are used, then for each file
exported to there must be an export pseudoop that gives the
total number of items in the definitions file as well as an
optional time stamp.




2.5 Stack

For machine independence, an infinite stack should be
provided by the assembler so that the compiler need not worry
about stack overflow. However, this is quite compltcated, and it
is more reasonable for the compiler to worry about it. The
compiler therefore must know that the actual number of
registers is 14, and should save the stack if it overflows. The
compiler should keep a stack model to prevent stack underflow
or overflow.




2.6 Fra~es


For local frames, the frame size is specified in the entry
pseudoop. For global variables, the assembler automatically
computes the amount of space necessary. The gbyte, gword,
and gblock pseudoops allocate space for global variables. The
maximum size of a local frame or a global frame is 4092 ~?rds.




2.7 Block structure and scoping

An assembly program has a block structure similar to a high-
level program. There are four types of blocks: the program
block, entry blocks, nested blocks, and unnamed blocks. All
code between the beginning and end of a block is part of the
block, except for code contained in inner blocks. The first


2-2 ASSEMBLER REFERENCE MANUAL
GENERAL RULES


pseudoop in an assembly program must be a program
pseudoop, and there is only one such program block. Entry
blocks declared with the entry pseudoop (usually procedures in
the outer scope) are callable and have their own local frame.
Nested blocks declared with the nested pseudoop (usually
nested procedures) are callable and have their own local
frame. They may also access the local frame of their parent,
where the parent is the nearest surrounding block with a
frame. Unnamed blocks declared with the begin pseudoop are
not callable and share the local frame of their parent.
Unnamed blocks are used to help build the symbol table and
have no effect on the code generated. All types of blocks end
with the end pseudoop.

The assembler tries not to enforce any unnecessary scoping or
pseudoop placement. For instance, global declarations,
imports, and exports may be located anywhere in a program
with the same effect. Code constants are gathered and put out
before the code for the nearest enclosing procedure.




2.8 Addressing

PrincOps instructions do not have addressing modes. There are
more instructions than there would be if there were addressing
modes. The number of instructions was reduced for the
assembler by allowing different addressing modes. It was felt
that this would make the sops more mnemonic and uniform
for compiler writers.

There are two forms of addressing, immediate and eventual.
They have the forms

immediate: sopname. format length expression
eventual: sopname.format length [base offset] indirection

The format is one of .r, .d, .f[num:numl, .f[] or empty. r
specifies that an argument is in floating-point format. d
specifies that an argument is a double word. f specifies that a
field within a word is to be operated on; the first number is the
offset of the first bit in the field, and the second number is the
length in bits of the field. f[J specifies that the field descriptor
is on the stack and is not a code argument.

The length is one of I, k, or empty. I specifies that one of the
arguments is a long pointer. If I is not present, short pointers
are assumed. k specifies that the argument is a link. Generally
the k is not used.

In immediate addressing, the expression may be an assembly
time constant (for instance, in a load immediate instruction).
The expression may be the name of an imported variable or
procedure (for instance, in an external function call
instruction).

In eventual addressing, the base may be either If, gf, or cb. If is
the start of the local frame. gf is the start of the global frame.
cb is the start of the code segment. The offset has the form .. +
expression", where expression is an assembly ti me constant.
The offset may be empty, in which case it is assumed to be zero.



ASSEMBLER REFERENCE MANUAL 2