# HG changeset patch # User lost # Date 1233209580 0 # Node ID afe30454382fd83aac90fcc409aa99b68fb9fbc0 # Parent 006d737756fdc2530aaa5fcb37001d2163545179 Made development version of LWASM be 2.1, not 3.0, because the next release will be an incremental feature release diff -r 006d737756fd -r afe30454382f configure.ac --- a/configure.ac Thu Jan 29 06:12:21 2009 +0000 +++ b/configure.ac Thu Jan 29 06:13:00 2009 +0000 @@ -1,4 +1,4 @@ -AC_INIT([LWTOOLS], [3.0], [lost@l-w.ca]) +AC_INIT([LWTOOLS], [2.1], [lost@l-w.ca]) AM_INIT_AUTOMAKE([-Wall -Werror foreign]) AC_PROG_CC AC_CONFIG_HEADERS([src/config.h]) diff -r 006d737756fd -r afe30454382f doc/manual.docbook.sgml --- a/doc/manual.docbook.sgml Thu Jan 29 06:12:21 2009 +0000 +++ b/doc/manual.docbook.sgml Thu Jan 29 06:13:00 2009 +0000 @@ -114,9 +114,944 @@ + +LWASM + +The LWTOOLS assembler is called LWASM. This chapter documents the various +features of the assembler. It is not, however, a tutorial on 6x09 assembly +language programming. + + +
+Command Line Options + +The binary for LWASM is called "lwasm". Note that the binary is in lower +case. lwasm takes the following command line arguments. + + + + + + + + +Select the DECB output format target. Equivalent to . + + + + + + + + + +Increase the debugging level. Only really useful to people hacking on the +LWASM source code itself. + + + + + + + + + +Select the output format. Valid values are for the object +file target, for the DECB LOADM format, and +for a raw binary. + + + + + + + + + + +Cause LWASM to generate a listing. If is specified, +the listing will go to that file. Otherwise it will go to the standard output +stream. By default, no listing is generated. + + + + + + + + +Select the proprietary object file format as the output target. + + + + + + + + + +Specify assembler pragmas. Multiple pragmas are separated by commas. The +pragmas accepted are the same as for the PRAGMA assembler directive described +below. + + + + + + + + + +Select raw binary as the output target. + + + + + + + + + +Present a help screen describing the command line options. + + + + + + + + +Provide a summary of the command line options. + + + + + + + + + +Display the software version. + + + + + + +
+ +
+Dialects + +LWASM supports all documented MC6809 instructions as defined by Motorola. +It also supports all known HD6309 instructions. There is some variation, +however, in the pneumonics used for the block transfer instructions. LWASM +uses TFM for all four of them as do several other assemblers. Others, such +as CCASM, use four separate opcodes for it (compare: copy+, copy-, implode, +and explode). There are advantages to both methods. However, it seems like +TFM has the most traction and thus, this is what LWASM supports. Support +for such variations may be added in the future. + + + +The standard addressing mode specifiers are supported. These are the +hash sign ("#") for immediate mode, the less than sign ("<") for forced +eight bit modes, and the greater than sign (">") for forced sixteen bit modes. + + +
+ +
+Source Format + + +LWASM accepts plain text files in a relatively free form. It can handle +lines terminated with CR, LF, CRLF, or LFCR which means it should be able +to assemble files on any platform on which it compiles. + + +Each line may start with a symbol. If a symbol is present, there must not +be any whitespace preceding it. It is legal for a line to contain nothing +but a symbol. + +The op code is separated from the symbol by whitespace. If there is +no symbol, there must be at least one white space character preceding it. +If applicable, the operand follows separated by whitespace. Following the +opcode and operand is an optional comment. + + +A comment can also be introduced with a * or a ;. The comment character is +optional for end of statement comments. However, if a symbol is the only +thing present on the line other than the comment, the comment character is +mandatory to prevent the assembler from interpreting the comment as an opcode. + + + +The opcode is not treated case sensitively. Neither are register names in +the operand fields. Symbols, however, are case sensitive. + + + +LWASM does not support line numbers in the file. + + +
+ +
+Symbols + + +Symbols have no length restriction. They may contain letters, numbers, dots, +dollar signs, and underscores. They must start with a letter, dot, or +underscore. + + + +LWASM also supports the concept of a local symbol. A local symbol is one +which contains either a "?" or a "@", which can appear anywhere in the symbol. +The scope of a local symbol is determined by a number of factors. First, +each included file gets its own local symbol scope. A blank line will also +be considered a local scope barrier. Macros each have their own local symbol +scope as well (which has a side effect that you cannot use a local symbol +as an argument to a macro). There are other factors as well. In general, +a local symbol is restricted to the block of code it is defined within. + + +
+ +
+Numbers and Expressions + +Numbers can be expressed in binary, octal, decimal, or hexadecimal. +Binary numbers may be prefixed with a "%" symbol or suffixed with a +"b" or "B". Octal numbers may be prefixed with "@" or suffixed with +"Q", "q", "O", or "o". Hexadecimal numbers may be prefixed with "$" or +suffixed with "H". No prefix or suffix is required for decimal numbers but +they can be prefixed with "&" if desired. Any constant which begins with +a letter must be expressed with the correct prefix base identifier or be +prefixed with a 0. Thus hexadecimal FF would have to be written either 0FFH +or $FF. Numbers are not case sensitive. + + + A symbol may appear at any point where a number is acceptable. The +special symbol "*" can be used to represent the starting address of the +current source line within expressions. + +The ASCII value of a character can be included by prefixing it with a +single quote ('). The ASCII values of two characters can be included by +prefixing the characters with a quote ("). + + +LWASM supports the following basic binary operators: +, -, *, /, and %. +These represent addition, subtraction, multiplication, division, and modulus. +It also supports unary negation and unary 1's complement (- and ^ respectively). +For completeness, a unary positive (+) is supported though it is a no-op. + + +Operator precedence follows the usual rules. multiplication, division, +and modulus take precedence over addition and subtraction. Unary operators +take precedence over binary operators. To force a specific order of evaluation, +parentheses can be used in the usual manner. + +
+ +
+Assembler Directives + +Various directives can be used to control the behaviour of the +assembler or to include non-code/data in the resulting output. Those directives +that are not described in detail in other sections of this document are +described below. + + +
+Data Directives + +FCB expr[,...] + +Include one or more constant bytes (separated by commas) in the output. + + + +FDB expr[,...] + +Include one or more words (separated by commas) in the output. + + + +FQB expr[,...] + +Include one or more double words (separated by commas) in the output. + + + +FCC string + + +Include a string of text in the output. The first character of the operand +is the delimiter which must appear as the last character and cannot appear +within the string. The string is included with no modifications> + + + + +FCN string + + +Include a NUL terminated string of text in the output. The first character of +the operand is the delimiter which must appear as the last character and +cannot appear within the string. A NUL byte is automatically appended to +the string. + + + + +FCS string + + +Include a string of text in the output with bit 7 of the final byte set. The +first character of the operand is the delimiter which must appear as the last +character and cannot appear within the string. + + + + +ZMB expr + + +Include a number of NUL bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. + + + + +ZMD expr + + +Include a number of zero words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. + + + + +ZMQ expr + + +Include a number of zero double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. + + + + +RMB expr + + +Reserve a number of bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. +The value of the bytes is undefined. + + + + +RMD expr + + +Reserve a number of words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the words is undefined. + + + + +RMQ expr + + +Reserve a number of double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the double-words is undefined. + + + + + +
+ +
+Address Definition +The directives in this section all control the addresses of symbols +or the assembly process itself. + + +ORG expr + +Set the assembly address. The address must be fully resolvable on the +first pass so no external or forward references are permitted. ORG is not +permitted within sections when outputting to object files. For the DECB +target, each ORG directive after which output is generated will cause +a new preamble to be output. ORG is only used to determine the addresses +of symbols when the raw target is used. + + + + + +sym EQU expr +sym = expr + +Define the value of sym to be expr. + + + + +sym SET expr + +Define the value of sym to be expr. +Unlike EQU, SET permits symbols to be defined multiple times as long as SET +is used for all instances. Use of the symbol before the first SET statement +that sets its value is undefined. + + + + +SETDP expr + +Inform the assembler that it can assume the DP register contains +expr. This directive is only advice to the assembler +to determine whether an address is in the direct page and has no effect +on the contents of the DP register. The value must be fully resolved during +the first assembly pass because it affects the sizes of subsequent instructions. + +This directive has no effect in the object file target. + + + + + +ALIGN expr + +Force the current assembly address to be a multiple of expr. +A series of NUL bytes is output to force the alignment, if required. The +alignment value must be fully resolved on the first pass because it affects +the addresses of subsquent instructions. +This directive is not suitable for inclusion in the middle of actual +code. It is intended to appear where the bytes output will not be executed. + + + + + + +
+ +
+Conditional Assembly + +Portions of the source code can be excluded or included based on conditions +known at assembly time. Conditionals can be nested arbitrarily deeply. The +directives associated with conditional assembly are described in this section. + +All conditionals must be fully bracketed. That is, every conditional +statement must eventually be followed by an ENDC at the same level of nesting. + +Conditional expressions are only evaluated on the first assembly pass. +It is not possible to game the assembly process by having a conditional +change its value between assembly passes. Thus there is not and never will +be any equivalent of IFP1 or IFP2 as provided by other assemblers. + + + +IFEQ expr + +If expr evaluates to zero, the conditional +will be considered true. + + + + + +IFNE expr +IF expr + +If expr evaluates to a non-zero value, the conditional +will be considered true. + + + + + +IFGT expr + +If expr evaluates to a value greater than zero, the conditional +will be considered true. + + + + + +IFGE expr + +If expr evaluates to a value greater than or equal to zero, the conditional +will be considered true. + + + + + +IFLT expr + +If expr evaluates to a value less than zero, the conditional +will be considered true. + + + + + +IFLE expr + +If expr evaluates to a value less than or equal to zero , the conditional +will be considered true. + + + + + +IFDEF sym + +If sym is defined at this point in the assembly +process, the conditional +will be considered true. + + + + + +IFNDEF sym + +If sym is not defined at this point in the assembly +process, the conditional +will be considered true. + + + + + +ELSE + + +If the preceding conditional at the same level of nesting was false, the +statements following will be assembled. If the preceding conditional at +the same level was true, the statements following will not be assembled. +Note that the preceding conditional might have been another ELSE statement +although this behaviour is not guaranteed to be supported in future versions +of LWASM. + + + + +ENDC + + +This directive marks the end of a conditional construct. Every conditional +construct must end with an ENDC directive. + + + + + +
+ +
+Miscelaneous Directives + +This section includes directives that do not fit into the other +categories. + + + + +INCLUDE filename + + +Include the contents of filename at this point in +the assembly as though it were a part of the file currently being processed. +Note that whitespace cannot appear in the name of the file. + + + + + +END [expr] + + +This directive causes the assembler to stop assembling immediately as though +it ran out of input. For the DECB target only, expr +can be used to set the execution address of the resulting binary. For all +other targets, specifying expr will cause an error. + + + + + +ERROR string + + +Causes a custom error message to be printed at this line. This will cause +assembly to fail. This directive is most useful inside conditional constructs +to cause assembly to fail if some condition that is known bad happens. + + + + + +
+ +
+ +
+Macros + +LWASM is a macro assembler. A macro is simply a name that stands in for a +series of instructions. Once a macro is defined, it is used like any other +assembler directive. Defining a macro can be considered equivalent to adding +additional assembler directives. + +Macros my accept parameters. These parameters are referenced within +a macro by the a backslash ("\") followed by a digit 1 through 9 for the first +through ninth parameters. They may also be referenced by enclosing the +decimal parameter number in braces ("{num}"). These parameter references +are replaced with the verbatim text of the parameter passed to the macro. A +reference to a non-existent parameter will be replaced by an empty string. +Macro parameters are expanded everywhere on each source line. That means +the parameter to a macro could be used as a symbol or it could even appear +in a comment or could cause an entire source line to be commented out +when the macro is expanded. + + +Parameters passed to a macro are separated by commas and the parameter list +is terminated by any whitespace. This means that neither a comma nor whitespace +may be included in a macro parameter. + + +Macro expansion is done recursively. That is, within a macro, macros are +expanded. This can lead to infinite loops in macro expansion. If the assembler +hangs for a long time while assembling a file that uses macros, this may be +the reason. + +Each macro expansion receives its own local symbol context which is not +inherited by any macros called by it nor is it inherited from the context +the macro was instantiated in. That means it is possible to use local symbols +within macros without having them collide with symbols in other macros or +outside the macro itself. However, this also means that using a local symbol +as a parameter to a macro, while legal, will not do what it would seem to do +as it will result in looking up the local symbol in the macro's symbol context +rather than the enclosing context where it came from, likely yielding either +an undefined symbol error or bizarre assembly results. + + +Note that there is no way to define a macro as local to a symbol context. All +macros are part of the global macro namespace. However, macros have a separate +namespace from symbols so it is possible to have a symbol with the same name +as a macro. + + + +Macros are defined only during the first pass. Macro expansion also +only occurs during the first pass. On the second pass, the macro +definition is simply ignored. Macros must be defined before they are used. + + +The following directives are used when defining macros. + + + +macroname MACRO + +This directive is used to being the definition of a macro called +macroname. If macroname already +exists, it is considered an error. Attempting to define a macro within a +macro is undefined. It may work and it may not so the behaviour should not +be relied upon. + + + + + +ENDM + + +This directive indicates the end of the macro currently being defined. It +causes the assembler to resume interpreting source lines as normal. + + + + +
+ +
+Object Files and Sections + +The object file target is very useful for large project because it allows +multiple files to be assembled independently and then linked into the final +binary at a later time. It allows only the small portion of the project +that was modified to be re-assembled rather than requiring the entire set +of source code to be available to the assembler in a single assembly process. +This can be particularly important if there are a large number of macros, +symbol definitions, or other metadata that uses resources at assembly time. +By far the largest benefit, however, is keeping the source files small enough +for a mere mortal to find things in them. + + + +With multi-file projects, there needs to be a means of resolving references to +symbols in other source files. These are known as external references. The +addresses of these symbols cannot be known until the linker joins all the +object files into a single binary. This means that the assembler must be +able to output the object code without knowing the value of the symbol. This +places some restrictions on the code generated by the assembler. For +example, the assembler cannot generate direct page addressing for instructions +that reference external symbols because the address of the symbol may not +be in the direct page. Similarly, relative branches and PC relative addressing +cannot be used in their eight bit forms. Everything that must be resolved +by the linker must be assembled to use the largest address size possible to +allow the linker to fill in the correct value at link time. Note that the +same problem applies to absolute address references as well, even those in +the same source file, because the address is not known until link time. + + + +It is often desired in multi-file projects to have code of various types grouped +together in the final binary generated by the linker as well. The same applies +to data. In order for the linker to do that, the bits that are to be grouped +must be tagged in some manner. This is where the concept of sections comes in. +Each chunk of code or data is part of a section in the object file. Then, +when the linker reads all the object files, it coalesces all sections of the +same name into a single section and then considers it as a unit. + + + +The existence of sections, however, raises a problem for symbols even +within the same source file. Thus, the assembler must treat symbols from +different sections within the same source file in the same manner as external +symbols. That is, it must leave them for the linker to resolve at link time, +with all the limitations that entails. + + + +In the object file target mode, LWASM requires all source lines that +cause bytes to be output to be inside a section. Any directives that do +not cause any bytes to be output can appear outside of a section. This includes +such things as EQU or RMB. Even ORG can appear outside a section. ORG, however, +makes no sense within a section because it is the linker that determines +the starting address of the section's code, not the assembler. + + + +All symbols defined globally in the assembly process are local to the +source file and cannot be exported. All symbols defined within a section are +considered local to the source file unless otherwise explicitly exported. +Symbols referenced from external source files must be declared external, +either explicitly or by asking the assembler to assume that all undefined +symbols are external. + + + +It is often handy to define a number of memory addresses that will be +used for data at run-time but which need not be included in the binary file. +These memory addresses are not initialized until run-time, either by the +program itself or by the program loader, depending on the operating environment. +Such sections are often known as BSS sections. LWASM supports generating +sections with a BSS attribute set which causes the section definition including +symbols exported from that section and those symbols required to resolve +references from the local file, but with no actual code in the object file. +It is illegal for any source lines within a BSS flagged section to cause any +bytes to be output. + + +The following directives apply to section handling. + + + +SECTION name[,flags] +SECT name[,flags] + + +Instructs the assembler that the code following this directive is to be +considered part of the section name. A section name +may appear multiple times in which case it is as though all the code from +all the instances of that section appeared adjacent within the source file. +However, flags may only be specified on the first +instance of the section. + +There is a single flag supported in flags. The +flag bss will cause the section to be treated as a BSS +section and, thus, no code will be included in the object file nor will any +bytes be permitted to be output. + +If assembly is already happening within a section, the section is implicitly +ended and the new section started. This is not considered an error although +it is recommended that all sections be explicitly closed. + + + + + +ENDSECTION +ENDSECT +ENDS + + +This directive ends the current section. This puts assembly outside of any +sections until the next SECTION directive. + + + + +sym EXTERN +sym EXTERNAL +sym IMPORT + + +This directive defines sym as an external symbol. +This directive may occur at any point in the source code. EXTERN definitions +are resolved on the first pass so an EXTERN definition anywhere in the +source file is valid for the entire file. The use of this directive is +optional when the assembler is instructed to assume that all undefined +symbols are external. In fact, in that mode, if the symbol is referenced +before the EXTERN directive, an error will occur. + + + + + +sym EXPORT + + +This directive defines sym as an exported symbol. +This directive may occur at any point in the source code, even before the +definition of the exported symbol. + + + + + + +
+ +
+Assembler Modes and Pragmas + +There are a number of options that affect the way assembly is performed. +Some of these options can only be specified on the command line because +they determine something absolute about the assembly process. These include +such things as the output target. Other things may be switchable during +the assembly process. These are known as pragmas and are, by definition, +not portable between assemblers. + + +LWASM supports a number of pragmas that affect code generation or +otherwise affect the behaviour of the assembler. These may be specified by +way of a command line option or by assembler directives. The directives +are as follows. + + + + +PRAGMA pragma[,...] + + +Specifies that the assembler should bring into force all pragmas +specified. Any unrecognized pragma will cause an assembly error. The new +pragmas will take effect immediately. This directive should be used when +the program will assemble incorrectly if the pragma is ignored or not supported. + + + + + +*PRAGMA pragma[,...] + + +This is identical to the PRAGMA directive except no error will occur with +unrecognized or unsupported pragmas. This directive, by virtue of starting +with a comment character, will also be ignored by assemblers that do not +support this directive. Use this variation if the pragma is not required +for correct functioning of the code. + + + + + +Each pragma supported has a positive version and a negative version. +The positive version enables the pragma while the negative version disables +it. The negatitve version is simply the positive version with "no" prefixed +to it. For instance, "pragma" vs. "nopragma". Only the positive version is +listed below. + +Pragmas are not case sensitive. + + + +index0tonone + + +When in force, this pragma enables an optimization affecting indexed addressing +modes. When the offset expression in an indexed mode evaluates to zero but is +not explicity written as 0, this will replace the operand with the equivalent +no offset mode, thus creating slightly faster code. Because of the advantages +of this optimization, it is enabled by default. + + + + + +undefextern + + +This pragma is only valid for targets that support external references. When in +force, if the assembler sees an undefined symbol on the second pass, it will +automatically define it as an external symbol. This automatic definition will +apply for the remainder of the assembly process, even if the pragma is +subsequently turned off. Because this behaviour would be potentially surprising, +this pragma defaults to off. + + +The primary use for this pragma is for projects that share a large number of +symbols between source files. In such cases, it is impractical to enumerate +all the external references in every source file. This allows the assembler +and linker to do the heavy lifting while not preventing a particular source +module from defining a local symbol of the same name as an external symbol +if it does not need the external symbol. (This pragma will not cause an +automatic external definition if there is already a locally defined symbol.) + + +This pragma will often be specified on the command line for large projects. +However, depending on the specific dynamics of the project, it may be sufficient +for one or two files to use this pragma internally. + + + + + +
+ +
+ + +LWLINK + + + + Object Files - + +LWTOOLS uses a proprietary object file format. It is proprietary in the sense +that it is specific to LWTOOLS, not that it is a hidden format. It would be +hard to keep it hidden in an open source tool chain anyway. This chapter +documents the object file format. +