# HG changeset patch # User William Astle # Date 1378699092 21600 # Node ID 40ecbd5da4810d47a05228d875819f9a8c87a40c # Parent 83f682ed4d6575769d59a1617c89fc94bb8e5a98 Part one of the C preprocessor This is part one of the C preprocessor. It finds and then fails to intepret directives. Also handles line splicing and trigraphs. diff -r 83f682ed4d65 -r 40ecbd5da481 .hgignore --- a/.hgignore Sun Sep 08 17:08:50 2013 -0600 +++ b/.hgignore Sun Sep 08 21:58:12 2013 -0600 @@ -10,3 +10,4 @@ /lwar$ /lwasm$ /lwcc$ +/lwcc-cpp$ diff -r 83f682ed4d65 -r 40ecbd5da481 Makefile --- a/Makefile Sun Sep 08 17:08:50 2013 -0600 +++ b/Makefile Sun Sep 08 21:58:12 2013 -0600 @@ -55,9 +55,10 @@ lwlink/lwlink$(PROGSUFFIX) \ lwar/lwar$(PROGSUFFIX) \ lwlink/lwobjdump$(PROGSUFFIX) \ - lwcc/driver/lwcc$(PROGSUFFIX) + lwcc/driver/lwcc$(PROGSUFFIX) \ + lwcc/cpp/lwcc-cpp$(PROGSUFFIX) -LWCC_LIBBIN_FILES = +LWCC_LIBBIN_FILES = lwcc/cpp/lwcc-cpp$(PROGSUFFIX) LWCC_LIBLIB_FILES = LWCC_LIBINC_FILES = @@ -100,12 +101,18 @@ lwcc_driver_objs := $(lwcc_driver_srcs:.c=.o) lwcc_driver_deps := $(lwcc_driver_srcs:.c=.d) +lwcc_cpp_srcs := main.c error.c file.c +lwcc_cpp_srcs := $(addprefix lwcc/cpp/,$(lwcc_cpp_srcs)) +lwcc_cpp_objs := $(lwcc_cpp_srcs:.c=.o) +lwcc_cpp_deps := $(lwcc_cpp_srcs:.c=.d) + .PHONY: lwlink lwasm lwar lwobjdump lwcc lwlink: lwlink/lwlink$(PROGSUFFIX) lwasm: lwasm/lwasm$(PROGSUFFIX) lwar: lwar/lwar$(PROGSUFFIX) lwobjdump: lwlink/lwobjdump$(PROGSUFFIX) -lwcc: lwcc/driver/lwcc +lwcc: lwcc/driver/lwcc$(PROGSUFFIX) +lwcc-cpp: lwcc/cpp/lwcc-cpp$(PROGSUFFIX) lwasm/lwasm$(PROGSUFFIX): $(lwasm_objs) lwlib @echo Linking $@ @@ -127,6 +134,10 @@ @echo Linking $@ @$(CC) -o $@ $(lwcc_driver_objs) $(LDFLAGS) +lwcc/cpp/lwcc-cpp$(PROGSUFFIX): $(lwcc_cpp_objs) lwlib + @echo Linking $@ + @$(CC) -o $@ $(lwcc_cpp_objs) $(LDFLAGS) + #.PHONY: lwlib .INTERMEDIATE: lwlib lwlib: lwlib/liblw.a @@ -157,8 +168,8 @@ clean: $(cleantargs) @echo "Cleaning up" @rm -f lwlib/liblw.a lwasm/lwasm$(PROGSUFFIX) lwlink/lwlink$(PROGSUFFIX) lwlink/lwobjdump$(PROGSUFFIX) lwar/lwar$(PROGSUFFIX) - @rm -f lwcc/driver/lwcc$(PROGSUFFIX) - @rm -f $(lwcc_driver_ojbs) + @rm -f lwcc/driver/lwcc$(PROGSUFFIX) lwcc/cpp/lwcc-cpp$(PROGSUFFIX) + @rm -f $(lwcc_driver_ojbs) $(lwcc_preproc_objs) @rm -f $(lwasm_objs) $(lwlink_objs) $(lwar_objs) $(lwlib_objs) $(lwobjdump_objs) @rm -f $(extra_clean) @rm -f */*.exe @@ -182,13 +193,13 @@ install -d $(LWCC_INSTALLLIBDIR)/lib install -d $(LWCC_INSTALLLIBDIR)/include ifneq ($(LWCC_LIBBIN_FILES),) - install $(LWCC_LIBBIN_FILES) $(LIBCC_INSTALLLIBDIR)/bin + install $(LWCC_LIBBIN_FILES) $(LWCC_INSTALLLIBDIR)/bin endif ifneq ($(LWCC_LIBLIB_FILES),) - install $(LWCC_LIBLIB_FILES) $(LIBCC_INSTALLLIBDIR)/lib + install $(LWCC_LIBLIB_FILES) $(LWCC_INSTALLLIBDIR)/lib endif ifneq ($(LWCC_LIBINC_FILES),) - install $(LWCC_LIBINC_FILES) $(LIBCC_INSTALLLIBDIR)/include + install $(LWCC_LIBINC_FILES) $(LWCC_INSTALLLIBDIR)/include endif .PHONY: test diff -r 83f682ed4d65 -r 40ecbd5da481 lwcc/README.txt --- a/lwcc/README.txt Sun Sep 08 17:08:50 2013 -0600 +++ b/lwcc/README.txt Sun Sep 08 21:58:12 2013 -0600 @@ -17,6 +17,13 @@ likely to change substantially. +cpp/ + +This is the actual C preprocessor. Its specific interface is deliberately +undocumented. Do not call it directly. Ever. Just don't. Bad Things(tm) will +happen if you do. + + liblwcc/ This contains any runtime libraries the compiler needs to support its diff -r 83f682ed4d65 -r 40ecbd5da481 lwcc/cpp/cpp.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lwcc/cpp/cpp.h Sun Sep 08 21:58:12 2013 -0600 @@ -0,0 +1,57 @@ +/* +lwcc/cpp/cpp.h + +Copyright © 2013 William Astle + +This file is part of LWTOOLS. + +LWTOOLS is free software: you can redistribute it and/or modify it under the +terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +*/ + +#ifndef cpp_h_seen___ +#define cpp_h_seen___ + +#include + +enum +{ + CPP_NOUNG = -3, + CPP_EOL = -2, + CPP_EOF = -1, +}; + +struct file_stack_e +{ + const char *fn; + FILE *fp; + struct file_stack_e *next; + int line; + int col; + int eolstate; // end of line state for interpreting \r\n \n\r \n \r + int ra; // read ahead byte for trigraph scan + int qseen; // number of ? seen during trigraph scan + int unget; // character that has been "ungot" + int curc; // the most recent character retrieved +}; + +extern FILE *output_fp; +extern int trigraphs; +extern struct file_stack_e *file_stack; + +extern int process_file(const char *); + +extern void do_error(const char *, ...); +extern void do_warning(const char *, ...); + +#endif // cpp_h_seen___ diff -r 83f682ed4d65 -r 40ecbd5da481 lwcc/cpp/error.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lwcc/cpp/error.c Sun Sep 08 21:58:12 2013 -0600 @@ -0,0 +1,59 @@ +/* +lwcc/cpp/error.c + +Copyright © 2013 William Astle + +This file is part of LWTOOLS. + +LWTOOLS is free software: you can redistribute it and/or modify it under the +terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +*/ + +#include +#include +#include + +#include "cpp.h" + +static void show_file_pos(void) +{ + if (file_stack == NULL) + return; + + fprintf(stderr, "(%s:%d): ", file_stack -> fn, file_stack -> line); +} + +void do_error(const char *f, ...) +{ + va_list arg; + + va_start(arg, f); + fprintf(stderr, "ERROR: "); + show_file_pos(); + vfprintf(stderr, f, arg); + fprintf(stderr, "\n"); + va_end(arg); + exit(1); +} + +void do_warning(const char *f, ...) +{ + va_list arg; + + va_start(arg, f); + fprintf(stderr, "WARNING: "); + show_file_pos(); + vfprintf(stderr, f, arg); + fprintf(stderr, "\n"); + va_end(arg); +} diff -r 83f682ed4d65 -r 40ecbd5da481 lwcc/cpp/file.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lwcc/cpp/file.c Sun Sep 08 21:58:12 2013 -0600 @@ -0,0 +1,636 @@ +/* +lwcc/cpp/file.c + +Copyright © 2013 William Astle + +This file is part of LWTOOLS. + +LWTOOLS is free software: you can redistribute it and/or modify it under the +terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . + + +NOTES: + +The function fetch_byte() grabs a byte from the input file. It returns +CPP_EOF if end of file has been reached. The resulting byte has passed +through three filters, in order: + +* All CRLF, LFCR, LF, and CR have been converted to CPP_EOL +* If enabled (--trigraphs), trigraphs have been interpreted +* \\n (backslash-newline) has been processed (eliminated) + +To obtain a byte without processing \\n, call fetch_byte_tg(). + +*/ + +#include +#include +#include + +#include + +#include "cpp.h" + +struct file_stack_e *file_stack = NULL; + +int is_whitespace(int c) +{ + switch (c) + { + case ' ': + case '\t': + case '\r': + case '\n': + return 1; + } + return 0; +} + +int is_sidchr(c) +{ + if (c == '_' || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) + return 1; + return 0; +} + +int is_idchr(int c) +{ + if (c >= '0' && c <= '9') + return 1; + return is_sidchr(c); +} + +int is_ep(int c) +{ + if (c == 'e' || c == 'E' || c == 'p' || c == 'P') + return 1; + return 0; +} + +int is_hex(int c) +{ + if (c >= 'a' && c <= 'f') + return 1; + if (c >= 'A' && c <= 'F') + return 1; + if (c >= '0' && c <= '9') + return 1; + return 0; +} + +int is_dec(int c) +{ + if (c >= '0' && c <= '9') + return 1; + return 0; +} + +static void outchr(int c) +{ + fputc(c, output_fp); +} + +static void outstr(char *s) +{ + while (*s) + outchr(*s++); +} + +int fetch_byte_ll(struct file_stack_e *f) +{ + int c; + + if (f -> eolstate != 0) + { + f -> line++; + f -> col = 0; + } + c = getc(f -> fp); + f -> col++; + if (f -> eolstate == 1) + { + // just saw CR, munch LF + if (c == 10) + c = getc(f -> fp); + f -> eolstate = 0; + } + else if (f -> eolstate == 2) + { + // just saw LF, much CR + if (c == 13) + c = getc(f -> fp); + f -> eolstate = 0; + } + + if (c == 10) + { + // we have LF - end of line, flag to munch CR + f -> eolstate = 2; + c = CPP_EOL; + } + else if (c == 13) + { + // we have CR - end of line, flag to munch LF + f -> eolstate = 1; + c = CPP_EOL; + } + else if (c == EOF) + { + c = CPP_EOF; + } + return c; +} + +int fetch_byte_tg(struct file_stack_e *f) +{ + int c; + + if (!trigraphs) + { + c = fetch_byte_ll(f); + } + else + { + /* we have to do the trigraph shit here */ + if (f -> ra != CPP_NOUNG) + { + if (f -> qseen > 0) + { + c = '?'; + f -> qseen -= 1; + return c; + } + else + { + c = f -> ra; + f -> ra = CPP_NOUNG; + return c; + } + } + + c = fetch_byte_ll(f); + while (c == '?') + { + f -> qseen++; + c = fetch_byte_ll(f); + } + + if (f -> qseen >= 2) + { + // we have a trigraph + switch (c) + { + case '=': + c = '#'; + f -> qseen -= 2; + break; + + case '/': + c = '\\'; + f -> qseen -= 2; + break; + + case '\'': + c = '^'; + f -> qseen -= 2; + break; + + case '(': + c = '['; + f -> qseen -= 2; + break; + + case ')': + c = ']'; + f -> qseen -= 2; + break; + + case '!': + c = '|'; + f -> qseen -= 2; + break; + + case '<': + c = '{'; + f -> qseen -= 2; + break; + + case '>': + c = '}'; + f -> qseen -= 2; + break; + + case '~': + c = '~'; + f -> qseen -= 2; + break; + } + if (f -> qseen > 0) + { + f -> ra = c; + c = '?'; + f -> qseen--; + } + } + else if (f -> qseen > 0) + { + f -> ra = c; + c = '?'; + f -> qseen--; + } + } + return c; +} + +int fetch_byte(struct file_stack_e *f) +{ + int c; + +again: + if (f -> unget != CPP_NOUNG) + { + c = f -> unget; + f -> unget = CPP_NOUNG; + } + else + { + c = fetch_byte_tg(f); + } + if (c == '\\') + { + int c2; + c2 = fetch_byte_tg(f); + if (c2 == CPP_EOL) + goto again; + else + f -> unget = c2; + } + f -> curc = c; + return c; +} + +static void skip_line(struct file_stack_e *f) +{ + int c; + while ((c = fetch_byte(f)) != CPP_EOL && c != CPP_EOF) + /* do nothing */ ; +} + + +struct +{ + char *name; + void (*fn)(struct file_stack_e *); +} directives[] = +{ + { NULL, NULL }, + { NULL, NULL } +}; + +/* +This handles a preprocessing directive. Such a directive goes from the +next character to be retrieved from f until the first instance of CPP_EOL +or CPP_EOF. +*/ +void handle_directive(struct file_stack_e *f) +{ + int c, i; + char kw[20]; + +again: + while ((c = fetch_byte(f)) == ' ' || c == '\t') + /* do nothing */ ; + if (c == '/') + { + // maybe a comment // + c = fetch_byte(f); + if (c == '/') + { + // line comment + skip_line(f); + return; + } + if (c == '*') + { + // block comment + while (1) + { + c = fetch_byte(f); + if (c == CPP_EOF) + return; + if (c == '*') + { + c = fetch_byte(f); + if (c == '/') + { + // end of comment - try again for directive + goto again; + } + if (c == CPP_EOF) + return; + } + } + } + } + + // empty directive - do nothing + if (c == CPP_EOL) + return; + + if (c < 'a' || c > 'z') + goto out; + + i = 0; + do + { + kw[i++] = c; + if (i == sizeof(kw) - 1) + goto out; // keyword too long + c = fetch_byte(f); + } while ((c >= 'a' && c <= 'z') || (c == '_')); + kw[i++] = '\0'; + + /* we have a keyword here */ + for (i = 0; directives[i].name; i++) + { + if (strcmp(directives[i].name, kw) == 0) + { + (*directives[i].fn)(f); + return; + } + } + +/* if we fall through here, we have an unknown directive */ +out: + do_error("invalid preprocessor directive"); + skip_line(f); +} + +/* +Notes: + +Rather than tokenize the entire file, we run through it interpreting +things only as much as we need to in order to identify the following: + +preprocessing directives (#...) +identifiers which might need to be replaced with macros + +We have to interpret strings, character constants, and numbers to prevent +false positives in those situations. + +When we find a preprocessing directive, it is handled with a more +aggressive tokenization process and then intepreted accordingly. + +nlws is used to record the fact that only whitespace has occurred at the +start of a line. Whitespace is defined as comments or isspace(c). It gets +reset to 1 after each EOL character. If a non-whitespace character is +encountered, it is set to -1. If the character processing decides it really +is a whitespace character, it will set nlws back to 1 (block comment). +Elsewise, it will get set to 0 if it is still -1 when the loop starts again. + +This is needed so we can identify whitespace interposed before a +preprocessor directive. This is the only case where it matters for +the preprocessor. + +*/ +void preprocess_file(struct file_stack_e *f) +{ + int c; + int nlws = 1; + + while (1) + { + c = fetch_byte(f); +again: + if (nlws == -1) + nlws = 0; + if (c == CPP_EOF) + { + outchr('\n'); + return; + } + if (c == CPP_EOL) + { + nlws = 1; + outchr('\n'); + continue; + } + + if (!is_whitespace(c)) + nlws = -1; + + if (is_sidchr(c)) + { + // have identifier here - parse it off + char *ident = NULL; + int idlen = 0; + + do + { + ident = lw_realloc(ident, idlen + 1); + ident[idlen++] = c; + ident[idlen] = '\0'; + c = fetch_byte(f); + } while (is_idchr(c)); + + /* do something with the identifier here - macros, etc. */ + outstr(ident); + lw_free(ident); + + goto again; + } + + switch (c) + { + default: + outchr(c); + break; + + case '.': // a number - to prevent seeing an identifier in middle of number + outchr(c); + c = fetch_byte(f); + if (!is_dec(c)) + goto again; + /* fall through */ + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + do + { + outchr(c); + c = fetch_byte(f); + if (c == CPP_EOF) + return; + if (is_ep(c)) + { + outchr(c); + c = fetch_byte(f); + if (c == '-' || c == '+') + { + outchr(c); + c = fetch_byte(f); + } + } + } while ((is_idchr(c)) || (c == '.')); + goto again; + + case '#': + if (nlws) + { + handle_directive(f); + /* note: no need to reset nlws */ + } + else + outchr('#'); + break; + + case '\'': // character constant + outchr('\''); + while ((c = fetch_byte(f)) != '\'') + { + if (c == '\\') + { + outchr('\\'); + c = fetch_byte(f); + } + if (c == CPP_EOL) + { + do_warning("Unterminated character constant"); + goto again; + } + if (c == CPP_EOF) + return; + outchr(c); + } + outchr(c); + break; + + case '"': // strings + outchr(c); + while ((c = fetch_byte(f)) != '"') + { + if (c == '\\') + { + outchr('\\'); + c = fetch_byte(f); + } + if (c == CPP_EOL) + { + do_warning("unterminated string literal"); + goto again; + } + if (c == CPP_EOF) + return; + outchr(c); + } + outchr(c); + break; + + case '/': // comments + c = fetch_byte(f); + if (c == '/') + { + // line comment + outchr(' '); + do + { + c = fetch_byte(f); + } while (c != CPP_EOF && c != CPP_EOL); + } + else if (c == '*') + { + // block comment + for (;;) + { + c = fetch_byte(f); + if (c == CPP_EOF) + { + break; + } + if (c == CPP_EOL) + { + continue; + } + if (c == '*') + { + // maybe end of comment + c = fetch_byte(f); + if (c == '/') + { + // end of comment + break; + } + } + } + // replace comment with a single space + outchr(' '); + if (nlws == -1) + nlws = 1; + continue; + } + else + { + // restore eaten '/' + outchr('/'); + // process the character we just fetched + goto again; + } + } // switch + } // processing loop +} + +int process_file(const char *f) +{ + struct file_stack_e *nf; + FILE *fp; + + fprintf(stderr, "Processing %s\n", f); + + if (strcmp(f, "-") == 0) + fp = stdin; + else + fp = fopen(f, "rb"); + if (fp == NULL) + { + do_warning("Cannot open %s: %s", f, strerror(errno)); + return -1; + } + + /* push the file onto the file stack */ + nf = lw_alloc(sizeof(struct file_stack_e)); + nf -> fn = f; + nf -> fp = fp; + nf -> next = file_stack; + nf -> line = 1; + nf -> col = 0; + nf -> qseen = 0; + nf -> ra = CPP_NOUNG; + nf -> unget = CPP_NOUNG; + file_stack = nf; + + /* go preprocess the file */ + preprocess_file(nf); + + if (nf -> fp != stdin) + fclose(nf -> fp); + file_stack = nf -> next; + lw_free(nf); + return 0; +} diff -r 83f682ed4d65 -r 40ecbd5da481 lwcc/cpp/main.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lwcc/cpp/main.c Sun Sep 08 21:58:12 2013 -0600 @@ -0,0 +1,134 @@ +/* +lwcc/cpp/main.c + +Copyright © 2013 William Astle + +This file is part of LWTOOLS. + +LWTOOLS is free software: you can redistribute it and/or modify it under the +terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. + +This program is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +more details. + +You should have received a copy of the GNU General Public License along with +this program. If not, see . +*/ + +#include +#include +#include +#include + +#include +#include + +#include "cpp.h" + +/* command line option handling */ +#define PROGVER "lwcc-cpp from " PACKAGE_STRING +char *program_name; + +/* input files */ +lw_stringlist_t input_files; + +/* various flags */ +int trigraphs = 0; +char *output_file = NULL; +FILE *output_fp = NULL; + +static struct lw_cmdline_options options[] = +{ + { "output", 'o', "FILE", 0, "Output to FILE"}, + { "include", 'i', "FILE", 0, "Pre-include FILE" }, + { "includedir", 'I', "PATH", 0, "Add entry to the user include path" }, + { "sincludedir", 'S', "PATH", 0, "Add entry to the system include path" }, + { "define", 'D', "SYM[=VAL]",0, "Automatically define SYM to be VAL (or 1)"}, + { "trigraphs", 0x100, NULL, 0, "Enable interpretation of trigraphs" }, + { 0 } +}; + + +static int parse_opts(int key, char *arg, void *state) +{ + switch (key) + { + case 'o': + if (output_file) + do_error("Output file specified more than once."); + output_file = arg; + break; + + case 0x100: + trigraphs = 1; + break; + + case lw_cmdline_key_end: + break; + + case lw_cmdline_key_arg: + lw_stringlist_addstring(input_files, arg); + break; + + default: + return lw_cmdline_err_unknown; + } + return 0; +} + +static struct lw_cmdline_parser cmdline_parser = +{ + options, + parse_opts, + "INPUTFILE", + "lwcc-cpp - C preprocessor for lwcc", + PROGVER +}; + +int main(int argc, char **argv) +{ + program_name = argv[0]; + int retval = 0; + + input_files = lw_stringlist_create(); + + /* parse command line arguments */ + lw_cmdline_parse(&cmdline_parser, argc, argv, 0, 0, NULL); + + /* set up output file */ + if (output_file == NULL || strcmp(output_file, "-") == 0) + { + output_fp = stdout; + } + else + { + output_fp = fopen(output_file, "wb"); + if (output_fp == NULL) + { + do_error("Failed to create output file %s: %s", output_file, strerror(errno)); + } + } + + if (lw_stringlist_nstrings(input_files) == 0) + { + /* if no input files, work on stdin */ + retval = process_file("-"); + } + else + { + char *s; + lw_stringlist_reset(input_files); + for (s = lw_stringlist_current(input_files); s; s = lw_stringlist_next(input_files)) + { + retval = process_file(s); + if (retval != 0) + break; + } + } + lw_stringlist_destroy(input_files); + exit(retval); +}