TC-1 Code to Write

Be sure to read RE/flex and Bison documentations and tutorials, see RE/flex & Bison.

See the lecture notes, and read the C++ chapter of GNU Bison’s documentation.

Pay special attention to its ‘’Complete C++ Example’’ which is very much like our set up.

Makefile.am

Include your own testsuite in the tests directory. Some examples of tests are given in the tests folder (Given Test Cases). Do not hesitate to make use of them.

Include your own test suite in the tests directory, and hook it to make check. See Generating The Test Driver.

lib/misc/symbol.*, lib/misc/unique.*

The class misc::symbol keeps a single copy of identifiers, see The lib/misc Directory. Its implementation in lib/misc/symbol.hxx and lib/misc/symbol.cc is incomplete. Note that running make check at the root of the project will produce in lib/misc unit testing binaries. Having this unit test lib/misc/test-symbol.cc pass should be a goal by itself. As a matter of fact, unit tests were left to help you: once they pass successfully you may proceed to the rest of the compiler. misc::symbol’s implementation is based on misc::unique, a generic class implementing the Flyweight design pattern. The definition of this class, lib/misc/unique.hxx, is also to be completed.

lib/misc/variant.*

The implementation of the class template misc::variant<T0, Ts...> lacks a couple of conversion operators that you have to supply.

src/parse/scantiger.ll
Specifications:

The scanner must be able to scan inputs as described in Lexical Specifications and Additional Lexical Specifications.

What is RE/flex:

RE/flex is a C++ library used to generate lexical analyzers that can interact with Bison. Its purpose is to generate a C++ lexer class with a lex method, which can be called to initiate the lexical analysis of the input data and interact with Bison to produce a complete parser.

The file has a special syntax which is explained here.

More about RE/flex & Bison.

What does RE/flex generate?

RE/flex generates two files parsetiger.hh and parsetiger.cc these files contain the Lexer class which has a lex method that allows the lexer to be launched. Bison will use it to feed the parser with tokens.

How do I write a rule with RE/flex?

The goal of a lexer is to produce a token from an input stream. For that, RE/flex uses a Maximal Munch algorithm and thus proposes to define patterns as regex that will, when matched, create a token.

Here is an example of how to create a token from a regex to create a array token:

%%

"array"       return parser::make_ARRAY(td.location); // or `return TOKEN(ARRAY);

%%

To create the tokens in the parser you have to use a bison function which is defined for each token, which is : parser::make_TOKEN(...).

The macro TOKEN() allows to simplify the creation of tokens because Bison’s syntax for creating tokens is verbose.

You can find out how to write a rule on the RE/flex documentation.

To have more readability on the regexes you can define them before using them as done on this example.

RE/flex and sublexer:

To be able to lex the strings you will have to use sublexers, here is the syntax of the sublexers:

...
"\""        {grown_string.clear(); start(SC_STRING);}

/* Handling of the strings.  Initial " is eaten. */

<SC_STRING>{
"\"" {
  start(INITIAL);
  return TOKEN_VAL(STRING, grown_string);
}

...

\\x[0-9a-fA-F]{2}  {
  grown_string += strtol(text() + 2, 0, 16);
}

...
}

On RE/flex documentation, sublexers are called states.

Strings will be stored as C++ std::string. See the following code for the basics.

Symbols (i.e. identifiers) must be returned as misc::symbol objects, not strings.

Metavariable:

Be carefull, metavariable (Additional Lexical Specifications) keywords should only be scanned when extensions are enabled, i.e. not on direct user input.

Trace the lexing

To keep track of lexing you can enable it using the set_debug(true) method on the lexer.

Location tracking

The locations are tracked. The class Location to use is produced by Bison: src/parse/location.hh.

To track locations, adjust your scanner, use YY_USER_ACTION and the lex prologue:

...
%%
%{
  // Everything here is run each time :code:`lex` is *invoked*.
%}

/* The rules. */

"if"    return TOKEN(IF);
...
%%
src/parse/parsetiger.yy
Complete parser

The parser must be able to parse tokens as described in Syntactic Specifications and Additional Syntactic Specifications and must not produce comportment yet.

What is Bison?

Bison is a parser generator, allowing to generate parsers of the LR family. Bison is a tool created by the GNU and maintained by Akim Demaille the creator of the Tiger project (Fair Criticism). The goal of Bison is to write grammar rules in YACC (Yet another Compiler Compiler) format in order to be able to parse a language.

What does Bison generate?

Bison generates two files parsetiger.hh and parsetiger.cc which represent the parser. The parser is a class named Parser which contains a parse(...) method to start parsing.

What is the structure of a Bison file?

A bison file has a special syntax because it will generate C++. Here is a link to the documentation for more information.

How to write a Bison rule:

To write a rule with bison you have to start from the EBNF grammar defined in the Syntactic Specifications and Additional Syntactic Specifications, then adapt it to the YACC syntax that bison uses.

Here is an example of Bison code that parses this rule:

(* === Expressions. === *)
exp =
  (* Literals. *)
  | INT
  | STRING
  ;

In bison syntax:

exp:
   INT
   | STRING;

In this case INT and STRING are tokens created by the lexer.

During the development of your parser you will have problems with conflicts, there are two types of conflicts Shift/Reduce and Reduce/Reduce.

In order to solve these different problems I invite you to read this documentation on the Bison algorithm.

For some cases such as function arguments you will need to use recursive rules.

GLR parser

We use %glr-parser and %skeleton "glr2.cc" directives to have Bison generate a GLR parser. Thanks to GLR, some conflicts (S/R and/or R/R) can be resolved at runtime. Use %expect and %expect-rr to specify their number. For information, we allow zero R/R conflicts, and one S/R related to the “big lvalue” issue.

Trace the parsing

Use %printer to implement --parse-trace support for terminals and non terminals (see TC-1 Samples). For instance,

%define api.value.type variant
%token <int> INT "integer"
%printer { yyo << $$; } <int>
src/parse/tiger-driver.cc

The class TigerDriver drives the lexing and parsing of input file. Its implementation in src/parse/tiger-driver.cc is incomplete.

You have to add the creation of the lexer, the parser and launch the parsing.

TigerDriver acts as a bridge between the lexer, the parser and the error handling, among other things. For more information, see old/02-parser-scanner.pdf: most of what you need to know is detailed here.