Interpreter for CPP
Implementing Programming Languages, Assignment 3
Aarne Ranta (aarne (at) chalmers.se)

%!target:html

%!postproc(html): #NEW <!-- NEW -->


=Summary=

The objective of this assignment is to write an interpreter for a fragment
of the C++ programming language. The interpreter should run programs
and correctly perform all their input and output actions.

Before the work can be submitted, the interpreter has to pass some
tests, which are given on the book web page via links
later in this document. 

The recommended implementation is via a BNF
grammar processed by the BNF Converter (BNFC) tool. The syntax
tree created by the parser is first type checked by using the
type checker created in [Assignment 2 ../assignment2/assignment2.html]. The interpreter
should then make another pass of the type-checked code.


#NEW

The approximate size of the grammar is 50 rules, and the interpreter
code should be 100-300 lines, depending on the programming language
used.

All BNFC supported languages can be used, but guidance is guaranteed
only for Haskell and Java.

The semantics is partially characterized by formal rules in
Chapter 5 of the book.


#NEW

=Method=

The recommended procedure is two passes:
+ build a symbol table that for every function gives it source code syntax tree;
  the built-in functions can be left out and treated separately in the rule for
  eveluating function calls
+ interpret the program by eveluating the expression ``main()``


You can use the files in either of the directories
- [Haskell package haskell/]
- [Java package java/]


Copy your ``CPP.cf`` grammar and the
``TypeChecker`` module from Assignment 2 to the same directory.

Edit the file ``Interpreter.hs`` or ``Interpreter.class``
till it implements a complete interpreter. One way of doing
this is to copy the contents of ``TypeChecker`` and modify
them - the interpreter will be structurally very similar to
the type checker.


#NEW

=Language specification=

The language is the same as in Assignment 2, and you can use the
grammar file [``CPP.cf`` ../../examples/CPP.cf].
Also its type system is the same.

There are six built-in functions:
```
  void   printInt(int x)        // print an integer and a newline in standard output
  void   printDouble(double x)  // print a double and a newline in standard output
  void   printString(string x)  // print a string and a newline in standard output
  int    readInt()              // read an integer from standard input
  double readDouble()           // read a double from standard input
  string readString()           // read a string from standard input
```
The implementation of these functions is a part of the interpreter.


#NEW

=Values=

There are five types of values:
- integer values, e.g. -47
- double values, e.g. 3.14159
- string values, e.g. "hello world"
- boolean values, ``true`` and ``false``
- a void value, which need never be shown


Instead of boolean values, you may use integers.
Then ``true`` can be interpreted as 1 and ``false`` as 0.

Values can be seen as a special case of expressions: as expressions
that contain no variables and cannot be evaluated further.
But it is recommended to have a separate datatype of values,
in order to guarantee that evaluation always results in a value.

Thus the evaluation of an expression in an environment should always
result in a value.


#NEW

=Operational Semantics=

==Programs==

A program is a sequence of function definitions. 
Each function has a parameter list and a body, which
is a sequence of statements.

The evaluation of a function call starts by evaluting the arguments
and building an environment where the received values are assigned
to the argument variables (a.k.a. parameters) of the function. 

The statements in the body are then executed in
the order defined by their textual order as altered by ``while``
loops and ``if`` conditions. 

The function returns a value, which is obtained from a ``return``
statement. This statement can be assumed to be the in the last statement of 
the function body: either alone, or in the branches of an ``if-else``
statement.
If the return type is ``void``, no return statement is
required.


#NEW

==Statements==

A declaration, e.g. 
```
  int i ;
```
adds a variable to the current context. Its value
is initialized if and only if the declaration includes an
initializing expression, e.g.
```
  int i = 9 ;
```
An expression statement, e.g. 
```
  i++ ;
```
is evaluated, and its value is ignored.

A block of statements, e.g.
```
  {
    int i = 3 ;
    i++ ;
  }
```
is interpreted in an environment where a new context is pushed on 
the context stack at entrance, and popped at exit. 

A ``while`` statement, e.g.
```
  while (1 < 10){
    i++ ;
  }
```
is interpreted so that the condition expression is first evaluated.
If the value is ``true``, the body is interpreted in the resulting
environment, and the ``while`` statement is executed again.
If the value is ``false``, the statement after the ``while`` statement
is interpreted.

An ``if-else`` statement, e.g.
```
  if (1 < 10) i++ ; else j++ ;
```
is interpreted so that the condition expression is first evaluated.
If the value is ``true``, the statement before ``else`` is interpreted.
If the value is ``false``, the statement after ``else`` is interpreted.

A ``return`` statement is executed by evaluating its expression argument.
The value is returned to the caller of the function, and no more
statements in the function body are executed. 


#NEW

==Expressions==

The interpretation of an expression, also called evaluation,
returns a value whose type is determined by the type of
the expression.

A literal, e.g.
```
  123
  3.14
  true
  "hello world"
```
is not evaluated further but just converted to the corresponding value.

A variable, e.g.
```
  x
```
is evaluated by looking up its value in the innermost context where
it occurs. If the variable is not in the context, or has no value there,
the interpreter terminates with an error message
```
  uninitialized variable x
```
A function call, e.g.
```
  printInt(8 + 9)
```
is interpreted by first evaluating its arguments from left to right.
The environment is then looked up to find out how the function is
interpreted on the resulting values. Alternatively, since there are
only four function calls, they can be hard-coded in the
expression evaluation code.

A postincrement,
```
  i++
```
has the same value as its body initially has
(here ``i``). The value of the variable
``i`` is then incremented by 1. 
``i--`` correspondingly  decrements ``i`` by 1. 
If ``i`` is of type ``double``, 1.0 is used instead.

A preincrement,
```
  ++i
```
has the same value as ``i`` plus 1. This incremented value 
replaces the old value of ``i``. The decrement and double
variants are analogous.

The arithmetic operations addition, subtraction, multiplication, and division,
```
  a + b
  a - b
  a * b
  a / b
```
are interpreted by evaluating their operands from left to right.
The resulting values are then added, subtracted, etc., by using
appropriate operations of the implementation language. We are not
picky about the precision chosen, but suggest for simplicity
that ``int`` should be ``int`` and ``double`` should be ``double``.

Addition expressions for string arguments are interpreted by
concatenation, without any intervening spaces.

Comparisons,
```
  a <  b
  a >  b
  a >= b
  a <= b
  a == b
  a != b
```
are treated similarly to the arithmetic operations, using
comparisons of the implementation language. The returned 
value must be boolean (or an integer, if you use integers
to represent booleans).

Conjunction,
```
  a && b
```
is evaluated //lazily//: first ``a`` is evaluated. If the
result is ``true``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``false``, then ``false``
is returned without evaluating ``b``.

Disjunction,
```
  a || b
```
is also evaluated lazily: first ``a`` is evaluated. If the
result is ``false``, also ``b`` is evaluated, and the
value of ``b`` is returned. However, if
``a`` evaluates to ``true``, then ``true``
is returned without evaluating ``b``.

Assignment,
```
  x = a
```
is evaluated by first evaluating ``a``. The resulting
value is returned, but also the context is changed by assigning
this value to the innermost occurrence of ``x``.


#NEW

=Solution format=

==Input and output==

The interpreter must be a program called ``icpp``, which is 
executed by the command
```
  icpp <SourceFile>
```
and prints its output to the standard output. The output at success
must be just the output defined by the interpreter.

The output at failure is an interpreter error, or a
``TYPE ERROR`` as in Assignment 2, or a
``SYNTAX ERROR`` as in Assignment 1.

The input can be read not only from user typing on the
terminal, but also from standard input redirected from
a file or by ``echo``. For instance,
```
  ./icpp fibonacci.cc <test-input

  echo 20 | ./icpp fibonacci.cc
```

The easiest way to produce the proper format is to use the ready-made files in either of
- [Haskell package haskell/]
- [Java package java/]


#NEW

==Example of success==

Source file
```
// file good.cc

int main () 
{
  int i = readInt() ; //5

  printInt(i) ;   //5
  printInt(i++) ; //5
  printInt(i) ;   //6
  printInt(++i) ; //7
  printInt(i) ;   //7

}
```
Running the interpreter
```
  % echo 3 | ./icpp good.cc
  3
  3
  4
  5
  5  
```

#NEW

==Example of failure==

Source file
```
// file bad.cc

int main () 
{
  int i ;

  printInt(i) ;
  printInt(i++) ;
  printInt(i) ;
  printInt(++i) ;
  printInt(i) ;

}
```
Running the interpreter
```
  % icpp bad.cc
  uninitialized variable x
```
Thus it is assumed that the type checker does not detect
uninitialized variables.


#NEW

==Compiling the interpreter==

The interpreter is submitted as a source file package that can be
compiled by typing ``make``.
The file names must match the ready-made files in either the
[Haskell package haskell/] or the [Java package java/].
The simplest solution is to copy the contents of these files
and replace the grammar, the type checker, and the interpreter
by your own files.

If you want to write the interpreter in another language, the
procedure is the same: send a tar package
and make sure the interpreter can be compiled in a normal Unix enviroment
by typing ``make``.


#NEW

=Test programs=

Run the [test suite testsuite/] before submitting the assignment.


=Success criteria=

Your interpreter must pass the [test suite testsuite/].
The test suite contains only correct programs; the assumption
is that the type checker has rejected rejected the incorrect
programs.

The solution must be written in an easily readable and
maintainable way.