LW Tool Chain
Prev		Next

Chapter 1. Introduction

The LW tool chain provides utilities for building binaries for MC6809 and +HD6309 CPUs. The tool chain includes a cross-assembler and a cross-linker +which support several styles of output.

1.1. History

For a long time, I have had an interest in creating an operating system for +the Coco3. I finally started working on that project around the beginning of +2006. I had a number of assemblers I could choose from. Eventually, I settled +on one and started tinkering. After a while, I realized that assembler was not +going to be sufficient due to lack of macros and issues with forward references. +Then I tried another which handled forward references correctly but still did +not support macros. I looked around at other assemblers and they all lacked +one feature or another that I really wanted for creating my operating system.

The solution seemed clear at that point. I am a fair programmer so I figured +I could write an assembler that would do everything I wanted an assembler to +do. Thus the LWASM probject was born. After more than two years of on and off +work, version 1.0 of LWASM was released in October of 2008.

As the aforementioned operating system project progressed further, it became +clear that while assembling the whole project through a single file was doable, +it was not practical. When I found myself playing some fancy games with macros +in a bid to simulate sections, I realized I needed a means of assembling +source files separately and linking them later. This spawned a major development +effort to add an object file support to LWASM. It also spawned the LWLINK +project to provide a means to actually link the files.

Prev	Home	Next
LW Tool Chain		Output Formats

LW Tool Chain
Prev		Next

Chapter 2. Output Formats

The LW tool chain supports multiple output formats. Each format has its +advantages and disadvantages. Each format is described below.

2.1. Raw Binaries

A raw binary is simply a string of bytes. There are no headers or other +niceties. Both LWLINK and LWASM support generating raw binaries. ORG directives +in the source code only serve to set the addresses that will be used for +symbols but otherwise have no direct impact on the resulting binary.

Prev	Home	Next
Introduction		DECB Binaries

LW Tool Chain
Prev		Next

Chapter 3. LWASM

The LWTOOLS assembler is called LWASM. This chapter documents the various +features of the assembler. It is not, however, a tutorial on 6x09 assembly +language programming.

3.1. Command Line Options

The binary for LWASM is called "lwasm". Note that the binary is in lower +case. lwasm takes the following command line arguments.

--decb, -b: Select the DECB output format target. Equivalent to --format=decb.
--format=type, -f type: Select the output format. Valid values are obj for the object +file target, decb for the DECB LOADM format, and raw +for a raw binary.
--list[=file], -l[file]: Cause LWASM to generate a listing. If file is specified, +the listing will go to that file. Otherwise it will go to the standard output +stream. By default, no listing is generated.
--obj: Select the proprietary object file format as the output target.
--output=FILE, -o FILE: This option specifies the name of the output file. If not specified, the +default is a.out.
--pragma=pragma, -p pragma: Specify assembler pragmas. Multiple pragmas are separated by commas. The +pragmas accepted are the same as for the PRAGMA assembler directive described +below.
--raw, -r: Select raw binary as the output target.
--help, -?: Present a help screen describing the command line options.
--usage: Provide a summary of the command line options.
--version, -V: Display the software version.
--debug, -d: Increase the debugging level. Only really useful to people hacking on the +LWASM source code itself.

Prev	Home	Next
Object Files		Dialects

LW Tool Chain
Prev		Next

Chapter 4. LWLINK

The LWTOOLS linker is called LWLINK. This chapter documents the various features +of the linker.

4.1. Command Line Options

The binary for LWLINK is called "lwlink". Note that the binary is in lower +case. lwlink takes the following command line arguments.

--decb, -b: Selects the DECB output format target. This is equivalent to --format=decb
--output=FILE, -o FILE: This option specifies the name of the output file. If not specified, the +default is a.out.
--format=TYPE, -f TYPE: This option specifies the output format. Valid values are decb +and raw
--raw, -r: This option specifies the raw output format. +It is equivalent to --format=raw. +and raw
--script=FILE, -s: This option allows specifying a linking script to override the linker's +built in defaults.
--section-base=SECT=BASE: Cause section SECT to load at base address BASE. This will be prepended +to the built-in link script. It is ignored if a link script is provided.
--map=FILE, -m FILE: This will output a description of the link result to FILE.
--library=LIBSPEC, -l LIBSPEC: Load a library using the library search path. LIBSPEC will have "lib" prepended +and ".a" appended.
--library-path=DIR, -L DIR: Add DIR to the library search path.
--debug, -d: This option increases the debugging level. It is only useful for LWTOOLS +developers.
--help, -?: This provides a listing of command line options and a brief description +of each.
--usage: This will display a usage summary. +of each.
--version, -V: This will display the version of LWLINK.

Prev	Home	Next
Assembler Modes and Pragmas		Linker Operation

LW Tool Chain
Prev		Next

Chapter 5. Libraries and LWAR

LWTOOLS also includes a tool for managing libraries. These are analogous to +the static libraries created with the "ar" tool on POSIX systems. Each library +file contains one or more object files. The linker will treat the object +files within a library as though they had been specified individually on +the command line except when resolving external references. External references +are looked up first within the object files within the library and then, if +not found, the usual lookup based on the order the files are specified on +the command line occurs.

The tool for creating these libary files is called LWAR.

5.1. Command Line Options

The binary for LWAR is called "lwar". Note that the binary is in lower +case. The options lwar understands are listed below. For archive manipulation +options, the first non-option argument is the name of the archive. All other +non-option arguments are the names of files to operate on.

--add, -a: This option specifies that an archive is going to have files added to it. +If the archive does not already exist, it is created. New files are added +to the end of the archive.
--create, -c: This option specifies that an archive is going to be created and have files +added to it. If the archive already exists, it is truncated.
--merge, -m: If specified, any files specified to be added to an archive will be checked +to see if they are archives themselves. If so, their constituent members are +added to the archive. This is useful for avoiding archives containing archives.
--list, -l: This will display a list of the files contained in the archive.
--debug, -d: This option increases the debugging level. It is only useful for LWTOOLS +developers.
--help, -?: This provides a listing of command line options and a brief description +of each.
--usage: This will display a usage summary. +of each.
--version, -V: This will display the version of LWLINK. +of each.

Prev	Home	Next
Linking Scripts		Object Files

LW Tool Chain
Prev

Chapter 6. Object Files

LWTOOLS uses a proprietary object file format. It is proprietary in the sense +that it is specific to LWTOOLS, not that it is a hidden format. It would be +hard to keep it hidden in an open source tool chain anyway. This chapter +documents the object file format.

An object file consists of a series of sections each of which contains a +list of exported symbols, a list of incomplete references, and a list of +"local" symbols which may be used in calculating incomplete references. Each +section will obviously also contain the object code.

Exported symbols must be completely resolved to an address within the +section it is exported from. That is, an exported symbol must be a constant +rather than defined in terms of other symbols.

Each object file starts with a magic number and version number. The magic +number is the string "LWOBJ16" for this 16 bit object file format. The only +defined version number is currently 0. Thus, the first 8 bytes of the object +file are 4C574F424A313600

Each section has the following items in order:

section name
flags
list of local symbols (and addresses within the section)
list of exported symbols (and addresses within the section)
list of incomplete references along with the expressions to calculate them
the actual object code (for non-BSS sections)

The section starts with the name of the section with a NUL termination +followed by a series of flag bytes terminated by NUL. There are only two +flag bytes defined. A NUL (0) indicates no more flags and a value of 1 +indicates the section is a BSS section. For a BSS section, no actual +code is included in the object file.

Either a NULL section name or end of file indicate the presence of no more +sections.

Each entry in the exported and local symbols table consists of the symbol +(NUL terminated) followed by two bytes which contain the value in big endian +order. The end of a symbol table is indicated by a NULL symbol name.

Each entry in the incomplete references table consists of an expression +followed by a 16 bit offset where the reference goes. Expressions are +defined as a series of terms up to an "end of expression" term. Each term +consists of a single byte which identifies the type of term (see below) +followed by any data required by the term. Then end of the list is flagged +by a NULL expression (only an end of expression term).

Table 6-1. Object File Term Types

TERMTYPE	Meaning
00	end of expression
01	integer (16 bit in big endian order follows)
02	external symbol reference (NUL terminated symbol name follows)
03	local symbol reference (NUL terminated symbol name follows)
04	operator (1 byte operator number)
05	section base address reference

External references are resolved using other object files while local +references are resolved using the local symbol table(s) from this file. This +allows local symbols that are not exported to have the same names as +exported symbols or external references.

Table 6-2. Object File Operator Numbers

Number	Operator
01	addition (+)
02	subtraction (-)
03	multiplication (*)
04	division (/)
05	modulus (%)
06	integer division (\) (same as division)
07	bitwise and
08	bitwise or
09	bitwise xor
0A	boolean and
0B	boolean or
0C	unary negation, 2's complement (-)
0D	unary 1's complement (^)

An expression is represented in a postfix manner with both operands for +binary operators preceding the operator and the single operand for unary +operators preceding the operator.

Prev	Home
Libraries and LWAR

LW Tool Chain

William Astle

Table of Contents

1. Introduction

1.1. History

2. Output Formats

2.1. Raw Binaries
2.2. DECB Binaries
2.3. Object Files

3. LWASM

3.1. Command Line Options

3.2. Dialects

3.3. Source Format

3.4. Symbols

3.5. Numbers and Expressions

3.6. Assembler Directives

3.6.1. Data Directives
3.6.2. Address Definition
3.6.3. Conditional Assembly
3.6.4. Miscelaneous Directives

3.7. Macros

3.8. Object Files and Sections

3.9. Assembler Modes and Pragmas

4. LWLINK

4.1. Command Line Options
4.2. Linker Operation
4.3. Linking Scripts

5. Libraries and LWAR

5.1. Command Line Options

6. Object Files

List of Tables
6-1. Object File Term Types
6-2. Object File Operator Numbers

Chapter 1. Introduction

The LW tool chain provides utilities for building binaries for MC6809 and +HD6309 CPUs. The tool chain includes a cross-assembler and a cross-linker +which support several styles of output.

1.1. History

Chapter 2. Output Formats

The LW tool chain supports multiple output formats. Each format has its +advantages and disadvantages. Each format is described below.

2.1. Raw Binaries

2.2. DECB Binaries

A DECB binary is compatible with the LOADM command in Disk Extended +Color Basic on the CoCo. They are also compatible with CLOADM from Extended +Color Basic. These binaries include the load address of the binary as well +as encoding an execution address. These binaries may contain multiple loadable +sections, each of which has its own load address.

Each binary starts with a preamble. Each preamble is five bytes long. The +first byte is zero. The next two bytes specify the number of bytes to load +and the last two bytes specify the address to load the bytes at. Then, a +string of bytes follows. After this string of bytes, there may be another +preamble or a postamble. A postamble is also five bytes in length. The first +byte of the postamble is $FF, the next two are zero, and the last two are +the execution address for the binary.

Both LWASM and LWLINK can output this format.

2.3. Object Files

LWASM supports generating a proprietary object file format which is +described in Chapter 6. LWLINK is then used to link these +object files into a final binary in any of LWLINK's supported binary +formats.

Object files are very flexible in that they allow references that are not +known at assembly time to be resolved at link time. However, because the +addresses of such references are not known, there is no way for the assembler +has to use sixteen bit addressing modes for these references. The linker +will always use sixteen bits when resolving a reference which means any +instruction that requires an eight bit operand cannot use external references.

Object files also support the concept of sections which are not valid +for other output types. This allows related code from each object file +linked to be collapsed together in the final binary.

Chapter 3. LWASM

The LWTOOLS assembler is called LWASM. This chapter documents the various +features of the assembler. It is not, however, a tutorial on 6x09 assembly +language programming.

3.1. Command Line Options

The binary for LWASM is called "lwasm". Note that the binary is in lower +case. lwasm takes the following command line arguments.

--decb, -b: Select the DECB output format target. Equivalent to --format=decb.
--format=type, -f type: Select the output format. Valid values are obj for the object +file target, decb for the DECB LOADM format, and raw +for a raw binary.
--list[=file], -l[file]: Cause LWASM to generate a listing. If file is specified, +the listing will go to that file. Otherwise it will go to the standard output +stream. By default, no listing is generated.
--obj: Select the proprietary object file format as the output target.
--output=FILE, -o FILE: This option specifies the name of the output file. If not specified, the +default is a.out.
--pragma=pragma, -p pragma: Specify assembler pragmas. Multiple pragmas are separated by commas. The +pragmas accepted are the same as for the PRAGMA assembler directive described +below.
--raw, -r: Select raw binary as the output target.
--help, -?: Present a help screen describing the command line options.
--usage: Provide a summary of the command line options.
--version, -V: Display the software version.
--debug, -d: Increase the debugging level. Only really useful to people hacking on the +LWASM source code itself.

3.2. Dialects

LWASM supports all documented MC6809 instructions as defined by Motorola. +It also supports all known HD6309 instructions. There is some variation, +however, in the pneumonics used for the block transfer instructions. LWASM +uses TFM for all four of them as do several other assemblers. Others, such +as CCASM, use four separate opcodes for it (compare: copy+, copy-, implode, +and explode). There are advantages to both methods. However, it seems like +TFM has the most traction and thus, this is what LWASM supports. Support +for such variations may be added in the future.

The standard addressing mode specifiers are supported. These are the +hash sign ("#") for immediate mode, the less than sign ("<") for forced +eight bit modes, and the greater than sign (">") for forced sixteen bit modes.

Additionally, LWASM supports using the asterisk ("*") to indicate +base page addressing. This should not be used in hand-written source code, +however, because it is non-standard and may or may not be present in future +versions of LWASM.

3.3. Source Format

LWASM accepts plain text files in a relatively free form. It can handle +lines terminated with CR, LF, CRLF, or LFCR which means it should be able +to assemble files on any platform on which it compiles.

Each line may start with a symbol. If a symbol is present, there must not +be any whitespace preceding it. It is legal for a line to contain nothing +but a symbol.

The op code is separated from the symbol by whitespace. If there is +no symbol, there must be at least one white space character preceding it. +If applicable, the operand follows separated by whitespace. Following the +opcode and operand is an optional comment.

A comment can also be introduced with a * or a ;. The comment character is +optional for end of statement comments. However, if a symbol is the only +thing present on the line other than the comment, the comment character is +mandatory to prevent the assembler from interpreting the comment as an opcode.

For compatibility with the output generated by some C preprocessors, LWASM +will also ignore lines that begin with a #. This should not be used as a general +comment character, however.

The opcode is not treated case sensitively. Neither are register names in +the operand fields. Symbols, however, are case sensitive.

LWASM does not support line numbers in the file.

3.4. Symbols

Symbols have no length restriction. They may contain letters, numbers, dots, +dollar signs, and underscores. They must start with a letter, dot, or +underscore.

LWASM also supports the concept of a local symbol. A local symbol is one +which contains either a "?" or a "@", which can appear anywhere in the symbol. +The scope of a local symbol is determined by a number of factors. First, +each included file gets its own local symbol scope. A blank line will also +be considered a local scope barrier. Macros each have their own local symbol +scope as well (which has a side effect that you cannot use a local symbol +as an argument to a macro). There are other factors as well. In general, +a local symbol is restricted to the block of code it is defined within.

3.5. Numbers and Expressions

Numbers can be expressed in binary, octal, decimal, or hexadecimal. Binary +numbers may be prefixed with a "%" symbol or suffixed with a "b" or "B". +Octal numbers may be prefixed with "@" or suffixed with "Q", "q", "O", or +"o". Hexadecimal numbers may be prefixed with "$", "0x" or "0X", or suffixed +with "H". No prefix or suffix is required for decimal numbers but they can +be prefixed with "&" if desired. Any constant which begins with a letter +must be expressed with the correct prefix base identifier or be prefixed +with a 0. Thus hexadecimal FF would have to be written either 0FFH or $FF. +Numbers are not case sensitive.

A symbol may appear at any point where a number is acceptable. The +special symbol "*" can be used to represent the starting address of the +current source line within expressions.

The ASCII value of a character can be included by prefixing it with a +single quote ('). The ASCII values of two characters can be included by +prefixing the characters with a quote (").

LWASM supports the following basic binary operators: +, -, *, /, and %. +These represent addition, subtraction, multiplication, division, and modulus. +It also supports unary negation and unary 1's complement (- and ^ respectively). +For completeness, a unary positive (+) is supported though it is a no-op.

Operator precedence follows the usual rules. multiplication, division, +and modulus take precedence over addition and subtraction. Unary operators +take precedence over binary operators. To force a specific order of evaluation, +parentheses can be used in the usual manner.

3.6. Assembler Directives

Various directives can be used to control the behaviour of the +assembler or to include non-code/data in the resulting output. Those directives +that are not described in detail in other sections of this document are +described below.

3.6.1. Data Directives

FCB expr[,...], .DB expr[,...], .BYTE expr[,...]: Include one or more constant bytes (separated by commas) in the output.
FDB expr[,...], .DW expr[,...], .WORD expr[,...]: Include one or more words (separated by commas) in the output.
FQB expr[,...], .QUAD expr[,...], .4BYTE expr[,...]: Include one or more double words (separated by commas) in the output.
FCC string, .ASCII string, .STR string: Include a string of text in the output. The first character of the operand +is the delimiter which must appear as the last character and cannot appear +within the string. The string is included with no modifications>
FCN string, .ASCIZ string, .STRZ string: Include a NUL terminated string of text in the output. The first character of +the operand is the delimiter which must appear as the last character and +cannot appear within the string. A NUL byte is automatically appended to +the string.
FCS string, .ASCIS string, .STRS string: Include a string of text in the output with bit 7 of the final byte set. The +first character of the operand is the delimiter which must appear as the last +character and cannot appear within the string.
ZMB expr: Include a number of NUL bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted.
ZMD expr: Include a number of zero words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted.
ZMQ expr: Include a number of zero double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted.
RMB expr, .BLKB expr, .DS expr, .RS expr: Reserve a number of bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. +The value of the bytes is undefined.
RMD expr: Reserve a number of words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the words is undefined.
RMQ expr: Reserve a number of double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the double-words is undefined.

3.6.2. Address Definition

The directives in this section all control the addresses of symbols +or the assembly process itself.

ORG expr

Set the assembly address. The address must be fully resolvable on the +first pass so no external or forward references are permitted. ORG is not +permitted within sections when outputting to object files. For the DECB +target, each ORG directive after which output is generated will cause +a new preamble to be output. ORG is only used to determine the addresses +of symbols when the raw target is used.

sym EQU expr, sym = expr

Define the value of sym to be expr.

sym SET expr

Define the value of sym to be expr. +Unlike EQU, SET permits symbols to be defined multiple times as long as SET +is used for all instances. Use of the symbol before the first SET statement +that sets its value is undefined.

SETDP expr

Inform the assembler that it can assume the DP register contains +expr. This directive is only advice to the assembler +to determine whether an address is in the direct page and has no effect +on the contents of the DP register. The value must be fully resolved during +the first assembly pass because it affects the sizes of subsequent instructions.

This directive has no effect in the object file target.

ALIGN expr

Force the current assembly address to be a multiple of expr. +A series of NUL bytes is output to force the alignment, if required. The +alignment value must be fully resolved on the first pass because it affects +the addresses of subsquent instructions.

This directive is not suitable for inclusion in the middle of actual +code. It is intended to appear where the bytes output will not be executed.

3.6.3. Conditional Assembly

Portions of the source code can be excluded or included based on conditions +known at assembly time. Conditionals can be nested arbitrarily deeply. The +directives associated with conditional assembly are described in this section.

All conditionals must be fully bracketed. That is, every conditional +statement must eventually be followed by an ENDC at the same level of nesting.

Conditional expressions are only evaluated on the first assembly pass. +It is not possible to game the assembly process by having a conditional +change its value between assembly passes. Thus there is not and never will +be any equivalent of IFP1 or IFP2 as provided by other assemblers.

IFEQ expr: If expr evaluates to zero, the conditional +will be considered true.
IFNE expr, IF expr: If expr evaluates to a non-zero value, the conditional +will be considered true.
IFGT expr: If expr evaluates to a value greater than zero, the conditional +will be considered true.
IFGE expr: If expr evaluates to a value greater than or equal to zero, the conditional +will be considered true.
IFLT expr: If expr evaluates to a value less than zero, the conditional +will be considered true.
IFLE expr: If expr evaluates to a value less than or equal to zero , the conditional +will be considered true.
IFDEF sym: If sym is defined at this point in the assembly +process, the conditional +will be considered true.
IFNDEF sym: If sym is not defined at this point in the assembly +process, the conditional +will be considered true.
ELSE: If the preceding conditional at the same level of nesting was false, the +statements following will be assembled. If the preceding conditional at +the same level was true, the statements following will not be assembled. +Note that the preceding conditional might have been another ELSE statement +although this behaviour is not guaranteed to be supported in future versions +of LWASM.
ENDC: This directive marks the end of a conditional construct. Every conditional +construct must end with an ENDC directive.

3.6.4. Miscelaneous Directives

This section includes directives that do not fit into the other +categories.

INCLUDE filename

Include the contents of filename at this point in +the assembly as though it were a part of the file currently being processed. +Note that whitespace cannot appear in the name of the file.

END [expr]

This directive causes the assembler to stop assembling immediately as though +it ran out of input. For the DECB target only, expr +can be used to set the execution address of the resulting binary. For all +other targets, specifying expr will cause an error.

ERROR string

Causes a custom error message to be printed at this line. This will cause +assembly to fail. This directive is most useful inside conditional constructs +to cause assembly to fail if some condition that is known bad happens.

.MODULE string

This directive is ignored for most output targets. If the output target +supports encoding a module name into it, string +will be used as the module name.

As of version 2.2, no supported output targets support this directive.

3.7. Macros

LWASM is a macro assembler. A macro is simply a name that stands in for a +series of instructions. Once a macro is defined, it is used like any other +assembler directive. Defining a macro can be considered equivalent to adding +additional assembler directives.

Macros my accept parameters. These parameters are referenced within +a macro by the a backslash ("\") followed by a digit 1 through 9 for the first +through ninth parameters. They may also be referenced by enclosing the +decimal parameter number in braces ("{num}"). These parameter references +are replaced with the verbatim text of the parameter passed to the macro. A +reference to a non-existent parameter will be replaced by an empty string. +Macro parameters are expanded everywhere on each source line. That means +the parameter to a macro could be used as a symbol or it could even appear +in a comment or could cause an entire source line to be commented out +when the macro is expanded.

Parameters passed to a macro are separated by commas and the parameter list +is terminated by any whitespace. This means that neither a comma nor whitespace +may be included in a macro parameter.

Macro expansion is done recursively. That is, within a macro, macros are +expanded. This can lead to infinite loops in macro expansion. If the assembler +hangs for a long time while assembling a file that uses macros, this may be +the reason.

Each macro expansion receives its own local symbol context which is not +inherited by any macros called by it nor is it inherited from the context +the macro was instantiated in. That means it is possible to use local symbols +within macros without having them collide with symbols in other macros or +outside the macro itself. However, this also means that using a local symbol +as a parameter to a macro, while legal, will not do what it would seem to do +as it will result in looking up the local symbol in the macro's symbol context +rather than the enclosing context where it came from, likely yielding either +an undefined symbol error or bizarre assembly results.

Note that there is no way to define a macro as local to a symbol context. All +macros are part of the global macro namespace. However, macros have a separate +namespace from symbols so it is possible to have a symbol with the same name +as a macro.

Macros are defined only during the first pass. Macro expansion also +only occurs during the first pass. On the second pass, the macro +definition is simply ignored. Macros must be defined before they are used.

The following directives are used when defining macros.

macroname MACRO: This directive is used to being the definition of a macro called +macroname. If macroname already +exists, it is considered an error. Attempting to define a macro within a +macro is undefined. It may work and it may not so the behaviour should not +be relied upon.
ENDM: This directive indicates the end of the macro currently being defined. It +causes the assembler to resume interpreting source lines as normal.

3.8. Object Files and Sections

The object file target is very useful for large project because it allows +multiple files to be assembled independently and then linked into the final +binary at a later time. It allows only the small portion of the project +that was modified to be re-assembled rather than requiring the entire set +of source code to be available to the assembler in a single assembly process. +This can be particularly important if there are a large number of macros, +symbol definitions, or other metadata that uses resources at assembly time. +By far the largest benefit, however, is keeping the source files small enough +for a mere mortal to find things in them.

With multi-file projects, there needs to be a means of resolving references to +symbols in other source files. These are known as external references. The +addresses of these symbols cannot be known until the linker joins all the +object files into a single binary. This means that the assembler must be +able to output the object code without knowing the value of the symbol. This +places some restrictions on the code generated by the assembler. For +example, the assembler cannot generate direct page addressing for instructions +that reference external symbols because the address of the symbol may not +be in the direct page. Similarly, relative branches and PC relative addressing +cannot be used in their eight bit forms. Everything that must be resolved +by the linker must be assembled to use the largest address size possible to +allow the linker to fill in the correct value at link time. Note that the +same problem applies to absolute address references as well, even those in +the same source file, because the address is not known until link time.

It is often desired in multi-file projects to have code of various types grouped +together in the final binary generated by the linker as well. The same applies +to data. In order for the linker to do that, the bits that are to be grouped +must be tagged in some manner. This is where the concept of sections comes in. +Each chunk of code or data is part of a section in the object file. Then, +when the linker reads all the object files, it coalesces all sections of the +same name into a single section and then considers it as a unit.

The existence of sections, however, raises a problem for symbols even +within the same source file. Thus, the assembler must treat symbols from +different sections within the same source file in the same manner as external +symbols. That is, it must leave them for the linker to resolve at link time, +with all the limitations that entails.

In the object file target mode, LWASM requires all source lines that +cause bytes to be output to be inside a section. Any directives that do +not cause any bytes to be output can appear outside of a section. This includes +such things as EQU or RMB. Even ORG can appear outside a section. ORG, however, +makes no sense within a section because it is the linker that determines +the starting address of the section's code, not the assembler.

All symbols defined globally in the assembly process are local to the +source file and cannot be exported. All symbols defined within a section are +considered local to the source file unless otherwise explicitly exported. +Symbols referenced from external source files must be declared external, +either explicitly or by asking the assembler to assume that all undefined +symbols are external.

It is often handy to define a number of memory addresses that will be +used for data at run-time but which need not be included in the binary file. +These memory addresses are not initialized until run-time, either by the +program itself or by the program loader, depending on the operating environment. +Such sections are often known as BSS sections. LWASM supports generating +sections with a BSS attribute set which causes the section definition including +symbols exported from that section and those symbols required to resolve +references from the local file, but with no actual code in the object file. +It is illegal for any source lines within a BSS flagged section to cause any +bytes to be output.

The following directives apply to section handling.

SECTION name[,flags], SECT name[,flags], .AREA name[,flags]

Instructs the assembler that the code following this directive is to be +considered part of the section name. A section name +may appear multiple times in which case it is as though all the code from +all the instances of that section appeared adjacent within the source file. +However, flags may only be specified on the first +instance of the section.

There is a single flag supported in flags. The +flag bss will cause the section to be treated as a BSS +section and, thus, no code will be included in the object file nor will any +bytes be permitted to be output.

If the section name is "bss" or ".bss" in any combination of upper and +lower case, the section is assumed to be a BSS section. In that case, +the flag !bss can be used to override this assumption.

If assembly is already happening within a section, the section is implicitly +ended and the new section started. This is not considered an error although +it is recommended that all sections be explicitly closed.

ENDSECTION, ENDSECT, ENDS

This directive ends the current section. This puts assembly outside of any +sections until the next SECTION directive.

sym EXTERN, sym EXTERNAL, sym IMPORT

This directive defines sym as an external symbol. +This directive may occur at any point in the source code. EXTERN definitions +are resolved on the first pass so an EXTERN definition anywhere in the +source file is valid for the entire file. The use of this directive is +optional when the assembler is instructed to assume that all undefined +symbols are external. In fact, in that mode, if the symbol is referenced +before the EXTERN directive, an error will occur.

sym EXPORT, sym .GLOBL, EXPORT sym, .GLOBL sym

This directive defines sym as an exported symbol. +This directive may occur at any point in the source code, even before the +definition of the exported symbol.

Note that sym may appear as the operand or as the +statement's symbol. If there is a symbol on the statement, that will +take precedence over any operand that is present.

3.9. Assembler Modes and Pragmas

There are a number of options that affect the way assembly is performed. +Some of these options can only be specified on the command line because +they determine something absolute about the assembly process. These include +such things as the output target. Other things may be switchable during +the assembly process. These are known as pragmas and are, by definition, +not portable between assemblers.

LWASM supports a number of pragmas that affect code generation or +otherwise affect the behaviour of the assembler. These may be specified by +way of a command line option or by assembler directives. The directives +are as follows.

PRAGMA pragma[,...]: Specifies that the assembler should bring into force all pragmas +specified. Any unrecognized pragma will cause an assembly error. The new +pragmas will take effect immediately. This directive should be used when +the program will assemble incorrectly if the pragma is ignored or not supported.
*PRAGMA pragma[,...]: This is identical to the PRAGMA directive except no error will occur with +unrecognized or unsupported pragmas. This directive, by virtue of starting +with a comment character, will also be ignored by assemblers that do not +support this directive. Use this variation if the pragma is not required +for correct functioning of the code.

Each pragma supported has a positive version and a negative version. +The positive version enables the pragma while the negative version disables +it. The negatitve version is simply the positive version with "no" prefixed +to it. For instance, "pragma" vs. "nopragma". Only the positive version is +listed below.

Pragmas are not case sensitive.

index0tonone

When in force, this pragma enables an optimization affecting indexed addressing +modes. When the offset expression in an indexed mode evaluates to zero but is +not explicity written as 0, this will replace the operand with the equivalent +no offset mode, thus creating slightly faster code. Because of the advantages +of this optimization, it is enabled by default.

cescapes

This pragma will cause strings in the FCC, FCS, and FCN pseudo operations to +have C-style escape sequences interpreted. The one departure from the official +spec is that unrecognized escape sequences will return either the character +immediately following the backslash or some undefined value. Do not rely +on the behaviour of undefined escape sequences.

undefextern

This pragma is only valid for targets that support external references. When in +force, if the assembler sees an undefined symbol on the second pass, it will +automatically define it as an external symbol. This automatic definition will +apply for the remainder of the assembly process, even if the pragma is +subsequently turned off. Because this behaviour would be potentially surprising, +this pragma defaults to off.

The primary use for this pragma is for projects that share a large number of +symbols between source files. In such cases, it is impractical to enumerate +all the external references in every source file. This allows the assembler +and linker to do the heavy lifting while not preventing a particular source +module from defining a local symbol of the same name as an external symbol +if it does not need the external symbol. (This pragma will not cause an +automatic external definition if there is already a locally defined symbol.)

This pragma will often be specified on the command line for large projects. +However, depending on the specific dynamics of the project, it may be sufficient +for one or two files to use this pragma internally.

Chapter 4. LWLINK

The LWTOOLS linker is called LWLINK. This chapter documents the various features +of the linker.

4.1. Command Line Options

The binary for LWLINK is called "lwlink". Note that the binary is in lower +case. lwlink takes the following command line arguments.

--decb, -b: Selects the DECB output format target. This is equivalent to --format=decb
--output=FILE, -o FILE: This option specifies the name of the output file. If not specified, the +default is a.out.
--format=TYPE, -f TYPE: This option specifies the output format. Valid values are decb +and raw
--raw, -r: This option specifies the raw output format. +It is equivalent to --format=raw. +and raw
--script=FILE, -s: This option allows specifying a linking script to override the linker's +built in defaults.
--section-base=SECT=BASE: Cause section SECT to load at base address BASE. This will be prepended +to the built-in link script. It is ignored if a link script is provided.
--map=FILE, -m FILE: This will output a description of the link result to FILE.
--library=LIBSPEC, -l LIBSPEC: Load a library using the library search path. LIBSPEC will have "lib" prepended +and ".a" appended.
--library-path=DIR, -L DIR: Add DIR to the library search path.
--debug, -d: This option increases the debugging level. It is only useful for LWTOOLS +developers.
--help, -?: This provides a listing of command line options and a brief description +of each.
--usage: This will display a usage summary. +of each.
--version, -V: This will display the version of LWLINK.

4.2. Linker Operation

LWLINK takes one or more files in supported input formats and links them +into a single binary. Currently supported formats are the LWTOOLS object +file format and the archive format used by LWAR. While the precise method is +slightly different, linking can be conceptualized as the following steps.

First, the linker loads a linking script. If no script is specified, it +loads a built-in default script based on the output format selected. This +script tells the linker how to lay out the various sections in the final +binary.
Next, the linker reads all the input files into memory. At this time, it +flags any format errors in those files. It constructs a table of symbols +for each object at this time.
The linker then proceeds with organizing the sections loaded from each file +according to the linking script. As it does so, it is able to assign addresses +to each symbol defined in each object file. At this time, the linker may +also collapse different instances of the same section name into a single +section by appending the data from each subsequent instance of the section +to the first instance of the section.
Next, the linker looks through every object file for every incomplete reference. +It then attempts to fully resolve that reference. If it cannot do so, it +throws an error. Once a reference is resolved, the value is placed into +the binary code at the specified section. It should be noted that an +incomplete reference can reference either a symbol internal to the object +file or an external symbol which is in the export list of another object +file.
If all of the above steps are successful, the linker opens the output file +and actually constructs the binary.

4.3. Linking Scripts

A linker script is used to instruct the linker about how to assemble the +various sections into a completed binary. It consists of a series of +directives which are considered in the order they are encountered.

The sections will appear in the resulting binary in the order they are +specified in the script file. If a referenced section is not found, the linker will behave as though the +section did exist but had a zero size, no relocations, and no exports. +A section should only be referenced once. Any subsequent references will have +an undefined effect.

All numbers are in linking scripts are specified in hexadecimal. All directives +are case sensitive although the hexadecimal numbers are not.

A section name can be specified as a "*", then any section not +already matched by the script will be matched. The "*" can be followed +by a comma and a flag to narrow the section down slightly, also. +If the flag is "!bss", then any section that is not flagged as a bss section +will be matched. If the flag is "bss", then any section that is flagged as +bss will be matched.

The following directives are understood in a linker script.

section name load addr

This causes the section name to load at +addr. For the raw target, only one "load at" entry is +allowed for non-bss sections and it must be the first one. For raw targets, +it affects the addresses the linker assigns to symbols but has no other +affect on the output. bss sections may all have separate load addresses but +since they will not appear in the binary anyway, this is okay.

For the decb target, each "load" entry will cause a new "block" to be +output to the binary which will contain the load address. It is legal for +sections to overlap in this manner - the linker assumes the loader will sort +everything out.

section name

This will cause the section name to load after the previously listed +section.

exec addr or sym

This will cause the execution address (entry point) to be the address +specified (in hex) or the specified symbol name. The symbol name must +match a symbol that is exported by one of the object files being linked. +This has no effect for targets that do not encode the entry point into the +resulting file. If not specified, the entry point is assumed to be address 0 +which is probably not what you want. The default link scripts for targets +that support this directive automatically starts at the beginning of the +first section (usually "init" or "code") that is emitted in the binary.

pad size

This will cause the output file to be padded with NUL bytes to be exactly +size bytes in length. This only makes sense for a raw target.

Chapter 5. Libraries and LWAR

The tool for creating these libary files is called LWAR.

5.1. Command Line Options

--add, -a: This option specifies that an archive is going to have files added to it. +If the archive does not already exist, it is created. New files are added +to the end of the archive.
--create, -c: This option specifies that an archive is going to be created and have files +added to it. If the archive already exists, it is truncated.
--merge, -m: If specified, any files specified to be added to an archive will be checked +to see if they are archives themselves. If so, their constituent members are +added to the archive. This is useful for avoiding archives containing archives.
--list, -l: This will display a list of the files contained in the archive.
--debug, -d: This option increases the debugging level. It is only useful for LWTOOLS +developers.
--help, -?: This provides a listing of command line options and a brief description +of each.
--usage: This will display a usage summary. +of each.
--version, -V: This will display the version of LWLINK. +of each.

Chapter 6. Object Files

Exported symbols must be completely resolved to an address within the +section it is exported from. That is, an exported symbol must be a constant +rather than defined in terms of other symbols.

Each section has the following items in order:

section name
flags
list of local symbols (and addresses within the section)
list of exported symbols (and addresses within the section)
list of incomplete references along with the expressions to calculate them
the actual object code (for non-BSS sections)

Either a NULL section name or end of file indicate the presence of no more +sections.

Table 6-1. Object File Term Types

TERMTYPE	Meaning
00	end of expression
01	integer (16 bit in big endian order follows)
02	external symbol reference (NUL terminated symbol name follows)
03	local symbol reference (NUL terminated symbol name follows)
04	operator (1 byte operator number)
05	section base address reference

Table 6-2. Object File Operator Numbers

Number	Operator
01	addition (+)
02	subtraction (-)
03	multiplication (*)
04	division (/)
05	modulus (%)
06	integer division (\) (same as division)
07	bitwise and
08	bitwise or
09	bitwise xor
0A	boolean and
0B	boolean or
0C	unary negation, 2's complement (-)
0D	unary 1's complement (^)

An expression is represented in a postfix manner with both operands for +binary operators preceding the operator and the single operand for unary +operators preceding the operator.

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.2. Dialects

Prev	Home	Next
LWASM	Up	Source Format

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.3. Source Format

Each line may start with a symbol. If a symbol is present, there must not +be any whitespace preceding it. It is legal for a line to contain nothing +but a symbol.

For compatibility with the output generated by some C preprocessors, LWASM +will also ignore lines that begin with a #. This should not be used as a general +comment character, however.

The opcode is not treated case sensitively. Neither are register names in +the operand fields. Symbols, however, are case sensitive.

LWASM does not support line numbers in the file.

Prev	Home	Next
Dialects	Up	Symbols

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.4. Symbols

Symbols have no length restriction. They may contain letters, numbers, dots, +dollar signs, and underscores. They must start with a letter, dot, or +underscore.

Prev	Home	Next
Source Format	Up	Numbers and Expressions

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.5. Numbers and Expressions

A symbol may appear at any point where a number is acceptable. The +special symbol "*" can be used to represent the starting address of the +current source line within expressions.

The ASCII value of a character can be included by prefixing it with a +single quote ('). The ASCII values of two characters can be included by +prefixing the characters with a quote (").

Prev	Home	Next
Symbols	Up	Assembler Directives

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.6. Assembler Directives

3.6.1. Data Directives

FCB expr[,...], .DB expr[,...], .BYTE expr[,...]: Include one or more constant bytes (separated by commas) in the output.
FDB expr[,...], .DW expr[,...], .WORD expr[,...]: Include one or more words (separated by commas) in the output.
FQB expr[,...], .QUAD expr[,...], .4BYTE expr[,...]: Include one or more double words (separated by commas) in the output.
FCC string, .ASCII string, .STR string: Include a string of text in the output. The first character of the operand +is the delimiter which must appear as the last character and cannot appear +within the string. The string is included with no modifications>
FCN string, .ASCIZ string, .STRZ string: Include a NUL terminated string of text in the output. The first character of +the operand is the delimiter which must appear as the last character and +cannot appear within the string. A NUL byte is automatically appended to +the string.
FCS string, .ASCIS string, .STRS string: Include a string of text in the output with bit 7 of the final byte set. The +first character of the operand is the delimiter which must appear as the last +character and cannot appear within the string.
ZMB expr: Include a number of NUL bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted.
ZMD expr: Include a number of zero words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted.
ZMQ expr: Include a number of zero double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted.
RMB expr, .BLKB expr, .DS expr, .RS expr: Reserve a number of bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. +The value of the bytes is undefined.
RMD expr: Reserve a number of words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the words is undefined.
RMQ expr: Reserve a number of double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the double-words is undefined.

3.6.2. Address Definition

The directives in this section all control the addresses of symbols +or the assembly process itself.

ORG expr

sym EQU expr, sym = expr

Define the value of sym to be expr.

sym SET expr

SETDP expr

This directive has no effect in the object file target.

ALIGN expr

This directive is not suitable for inclusion in the middle of actual +code. It is intended to appear where the bytes output will not be executed.

3.6.3. Conditional Assembly

All conditionals must be fully bracketed. That is, every conditional +statement must eventually be followed by an ENDC at the same level of nesting.

IFEQ expr: If expr evaluates to zero, the conditional +will be considered true.
IFNE expr, IF expr: If expr evaluates to a non-zero value, the conditional +will be considered true.
IFGT expr: If expr evaluates to a value greater than zero, the conditional +will be considered true.
IFGE expr: If expr evaluates to a value greater than or equal to zero, the conditional +will be considered true.
IFLT expr: If expr evaluates to a value less than zero, the conditional +will be considered true.
IFLE expr: If expr evaluates to a value less than or equal to zero , the conditional +will be considered true.
IFDEF sym: If sym is defined at this point in the assembly +process, the conditional +will be considered true.
IFNDEF sym: If sym is not defined at this point in the assembly +process, the conditional +will be considered true.
ELSE: If the preceding conditional at the same level of nesting was false, the +statements following will be assembled. If the preceding conditional at +the same level was true, the statements following will not be assembled. +Note that the preceding conditional might have been another ELSE statement +although this behaviour is not guaranteed to be supported in future versions +of LWASM.
ENDC: This directive marks the end of a conditional construct. Every conditional +construct must end with an ENDC directive.

3.6.4. Miscelaneous Directives

This section includes directives that do not fit into the other +categories.

INCLUDE filename

Include the contents of filename at this point in +the assembly as though it were a part of the file currently being processed. +Note that whitespace cannot appear in the name of the file.

END [expr]

ERROR string

.MODULE string

This directive is ignored for most output targets. If the output target +supports encoding a module name into it, string +will be used as the module name.

As of version 2.2, no supported output targets support this directive.

Prev	Home	Next
Numbers and Expressions	Up	Macros

LW Tool Chain
Prev	Chapter 2. Output Formats	Next

2.2. DECB Binaries

Both LWASM and LWLINK can output this format.

Prev	Home	Next
Output Formats	Up	Object Files

LW Tool Chain
Prev	Chapter 2. Output Formats	Next

2.3. Object Files

Object files also support the concept of sections which are not valid +for other output types. This allows related code from each object file +linked to be collapsed together in the final binary.

Prev	Home	Next
DECB Binaries	Up	LWASM

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.7. Macros

Parameters passed to a macro are separated by commas and the parameter list +is terminated by any whitespace. This means that neither a comma nor whitespace +may be included in a macro parameter.

The following directives are used when defining macros.

macroname MACRO: This directive is used to being the definition of a macro called +macroname. If macroname already +exists, it is considered an error. Attempting to define a macro within a +macro is undefined. It may work and it may not so the behaviour should not +be relied upon.
ENDM: This directive indicates the end of the macro currently being defined. It +causes the assembler to resume interpreting source lines as normal.

Prev	Home	Next
Assembler Directives	Up	Object Files and Sections

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.8. Object Files and Sections

The following directives apply to section handling.

SECTION name[,flags], SECT name[,flags], .AREA name[,flags]

If the section name is "bss" or ".bss" in any combination of upper and +lower case, the section is assumed to be a BSS section. In that case, +the flag !bss can be used to override this assumption.

ENDSECTION, ENDSECT, ENDS

This directive ends the current section. This puts assembly outside of any +sections until the next SECTION directive.

sym EXTERN, sym EXTERNAL, sym IMPORT

sym EXPORT, sym .GLOBL, EXPORT sym, .GLOBL sym

This directive defines sym as an exported symbol. +This directive may occur at any point in the source code, even before the +definition of the exported symbol.

Note that sym may appear as the operand or as the +statement's symbol. If there is a symbol on the statement, that will +take precedence over any operand that is present.

Prev	Home	Next
Macros	Up	Assembler Modes and Pragmas

LW Tool Chain
Prev	Chapter 3. LWASM	Next

3.9. Assembler Modes and Pragmas

PRAGMA pragma[,...]: Specifies that the assembler should bring into force all pragmas +specified. Any unrecognized pragma will cause an assembly error. The new +pragmas will take effect immediately. This directive should be used when +the program will assemble incorrectly if the pragma is ignored or not supported.
*PRAGMA pragma[,...]: This is identical to the PRAGMA directive except no error will occur with +unrecognized or unsupported pragmas. This directive, by virtue of starting +with a comment character, will also be ignored by assemblers that do not +support this directive. Use this variation if the pragma is not required +for correct functioning of the code.

Pragmas are not case sensitive.

index0tonone

cescapes

undefextern

Prev	Home	Next
Object Files and Sections	Up	LWLINK

LW Tool Chain
Prev	Chapter 4. LWLINK	Next

4.2. Linker Operation

First, the linker loads a linking script. If no script is specified, it +loads a built-in default script based on the output format selected. This +script tells the linker how to lay out the various sections in the final +binary.
Next, the linker reads all the input files into memory. At this time, it +flags any format errors in those files. It constructs a table of symbols +for each object at this time.
The linker then proceeds with organizing the sections loaded from each file +according to the linking script. As it does so, it is able to assign addresses +to each symbol defined in each object file. At this time, the linker may +also collapse different instances of the same section name into a single +section by appending the data from each subsequent instance of the section +to the first instance of the section.
Next, the linker looks through every object file for every incomplete reference. +It then attempts to fully resolve that reference. If it cannot do so, it +throws an error. Once a reference is resolved, the value is placed into +the binary code at the specified section. It should be noted that an +incomplete reference can reference either a symbol internal to the object +file or an external symbol which is in the export list of another object +file.
If all of the above steps are successful, the linker opens the output file +and actually constructs the binary.

Prev	Home	Next
LWLINK	Up	Linking Scripts

LW Tool Chain
Prev	Chapter 4. LWLINK	Next

4.3. Linking Scripts

All numbers are in linking scripts are specified in hexadecimal. All directives +are case sensitive although the hexadecimal numbers are not.

The following directives are understood in a linker script.

section name load addr

section name

This will cause the section name to load after the previously listed +section.

exec addr or sym

pad size

This will cause the output file to be padded with NUL bytes to be exactly +size bytes in length. This only makes sense for a raw target.

Prev	Home	Next
Linker Operation	Up	Libraries and LWAR