GF Developers Guide

Authors: Björn Bringert, Krasimir Angelov and Thomas Hallgren
Last update: 2016-06-17, 12:22



Before you start

This guide is intended for people who want to contribute to the development of the GF compiler or the Resource Grammar Library. If you are a GF user who just wants to download and install GF (e.g to develop your own grammars), the simpler guide on the GF download page should be sufficient.

Setting up your system for building GF

To build GF from source you need to install some tools on your system: the Haskell Platform, Darcs and the Haskeline library.

On Linux the best option is to install the tools via the standard software distribution channels, i.e. by using the Software Center in Ubuntu or the corresponding tool in other popular Linux distributions. Or, from a Terminal window, the following command should be enough:

On Mac OS and Windows, the tools can be downloaded from their respective web sites, as described below.

The Haskell Platform

GF is written in Haskell, so first of all you need the Haskell Platform, e.g. version 7.10.3. Downloads and installation instructions are available from here:

http://hackage.haskell.org/platform/

Once you have installed the Haskell Platform, open a terminal (Command Prompt on Windows) and try to execute the following command:

  $ ghc --version

This command should show you which version of GHC you have. If the installation of the Haskell Platform was successful you should see a message like:

  The Glorious Glasgow Haskell Compilation System, version 7.8.3

Other required tools included in the Haskell Platform are Cabal, Alex and Happy.

Darcs

To get the GF source code, you also need Darcs, version 2 or later. Darcs 2.10 is recommended (July 2015).

Darcs is a distributed version control system, see http://darcs.net/ for more information. There are precompiled packages for many platforms available and source code if you want to compile it yourself. Darcs is also written in Haskell and so you can use GHC to compile it.

The haskeline library

GF uses haskeline to enable command line editing in the GF shell. This should work automatically on Mac OS and Windows, but on Linux one extra step is needed to make sure the C libraries (terminfo) required by haskeline are installed. Here is one way to do this:

Getting the source

Once you have all tools in place you can get the GF source code. If you just want to compile and use GF then it is enough to have read-only access. It is also possible to make changes in the source code but if you want these changes to be applied back to the main source repository you will have to send the changes to us. If you plan to work continuously on GF then you should consider getting read-write access.

Read-only access

Getting a fresh copy for read-only access

Anyone can get the latest development version of GF by running (all on one line):

  $ darcs get --lazy --set-scripts-executable http://www.grammaticalframework.org/ gf

This will create a directory called gf in the current directory.

Updating your copy

To get all new patches from the main repo:

  $ darcs pull -a

This can be done anywhere in your local repository, i.e. in the gf directory, or any of its subdirectories. Without -a, you can choose which patches you want to get.

Recording local changes

Since every copy is a repository, you can have local version control of your changes.

If you have added files, you first need to tell your local repository to keep them under revision control:

  $ darcs add file1 file2 ...

To record changes, use:

  $ darcs record

This creates a patch against the previous version and stores it in your local repository. You can record any number of changes before pushing them to the main repo. In fact, you don't have to push them at all if you want to keep the changes only in your local repo.

If you think there are too many questions about what to record, you can answer f to record all remaining changes in the current file, or s to skip them. Use ? to get a list of more options.

Submitting patches

If you are using read-only access, send your patches by email to someone with write-access. First record your changes in your local repository, as described above. You can send any number of recorded patches as one patch bundle. You create the patch bundle with:

  $ darcs send -o mypatch.patch
  $ gzip mypatch.patch

(where mypatch is hopefully replaced by a slightly more descriptive name). Since some e-mail setups change text attachments (most likely by changing the newline characters) you need to send the patch in some compressed format, such as GZIP, BZIP2 or ZIP.

Send it as an e-mail attachment.

Read-write access

If you have a user account on www.grammaticalframework.org, you can get read-write access over SSH to the GF repository.

Getting a fresh copy

Get your copy with (all on one line), replacing user with your own user name on www.grammaticalframework.org:

  $ darcs get --lazy --set-scripts-executable user@www.grammaticalframework.org:/usr/local/www/GF/ gf

The option --lazy means that darcs defers downloading all the history for the repository. This saves space, bandwidth and CPU time, and most people don't need the full history of all changes in the past.

Updating your copy

Get all new patches from the main repo:

  $ darcs pull

You can add -a to get all patches without answering yes/no to each patch.

Commit your changes

There are two steps to commiting a change to the main repo. First you have to record the changes that you want to commit (see Recording local changes above), then you push them to the main repo. If you are using ssh-access, all you need to do is:

  $ darcs push

If you use the -a flag to push, all local patches which are not in the main repo are pushed.

Apply a patch from someone else

Use:

  $ darcs apply < mypatch.patch

This applies the patch to your local repository. To commit it to the main repo, use darcs push.

Further information about Darcs

For more info about what you can do with darcs, see http://darcs.net/manual/

Compilation from source with Cabal

The build system of GF is based on Cabal, which is part of the Haskell Platform, so no extra steps are needed to install it. In the simplest case, all you need to do to compile and install GF, after downloading the source code as described above, is

  $ cd gf
  $ cabal install

This will automatically download any additional Haskell libraries needed to build GF. If this is the first time you use Cabal, you might need to run cabal update first, to update the list of available libraries.

If you want more control, the process can also be split up into the usual configure, build and install steps.

Configure

During the configuration phase Cabal will check that you have all necessary tools and libraries needed for GF. The configuration is started by the command:

  $ cabal configure

If you don't see any error message from the above command then you have everything that is needed for GF. You can also add the option -v to see more details about the configuration.

You can use cabal configure --help to get a list of configuration options.

Build

The build phase does two things. First it builds the GF compiler from the Haskell source code and after that it builds the GF Resource Grammar Library using the already build compiler. The simplest command is:

  $ cabal build

Again you can add the option -v if you want to see more details.

Parallel builds

If you have Cabal>=1.20 you can enable parallel compilation by using

  $ cabal build -j

or by putting a line

  jobs: $ncpus

in your .cabal/config file. Cabal will pass this option to GHC when building the GF compiler, if you have GHC>=7.8.

Cabal also passes -j to GF to enable parallel compilation of the Resource Grammar Library. This is done unconditionally to avoid causing problems for developers with Cabal<1.20. You can disable this by editing the last few lines in WebSetup.hs.

Partial builds

NOTE: The following doesn't work with recent versions of cabal.

Sometimes you just want to work on the GF compiler and don't want to recompile the resource library after each change. In this case use this extended command:

  $ cabal build rgl-none

The resource library could also be compiled in two modes: with present tense only and with all tenses. By default it is compiled with all tenses. If you want to use the library with only present tense you can compile it in this special mode with the command:

  $ cabal build present

You could also control which languages you want to be recompiled by adding the option langs=list. For example the following command will compile only the English and the Swedish language:

  $ cabal build langs=Eng,Swe

Install

After you have compiled GF you need to install the executable and libraries to make the system usable.

  $ cabal copy
  $ cabal register

This command installs the GF compiler for a single user, in the standard place used by Cabal. On Linux and Mac this could be $HOME/.cabal/bin. On Mac it could also be $HOME/Library/Haskell/bin. On Windows this is C:\Program Files\Haskell\bin.

The compiled GF Resource Grammar Library will be installed under the same prefix, e.g. in $HOME/.cabal/share/gf-3.3.3/lib on Linux and in C:\Program Files\Haskell\gf-3.3.3\lib on Windows.

If you want to install in some other place then use the --prefix option during the configuration phase.

Clean

Sometimes you want to clean up the compilation and start again from clean sources. Use the clean command for this purpose:

  $ cabal clean

Known problems with Cabal

Some versions of Cabal (at least version 1.16) seem to have a bug that can cause the following error:

  Configuring gf-3.x...
  setup: Distribution/Simple/PackageIndex.hs:124:8-13: Assertion failed

The exact cause of this problem is unclear, but it seems to happen during the configure phase if the same version of GF is already installed, so a workaround is to remove the existing installation with

  ghc-pkg unregister gf

You can check with ghc-pkg list gf that it is gone.

Compilation with make

If you feel more comfortable with Makefiles then there is a thin Makefile wrapper arround Cabal for you. If you just type:

  $ make

the configuration phase will be run automatically if needed and after that the sources will be compiled.

For installation use:

  $ make install

For cleaning:

  $ make clean

Compiling GF with C run-time system support

The C run-time system is a separate implementation of the PGF run-time services. It makes it possible to work with very large, ambiguous grammars, using probabilistic models to obtain probable parses. The C run-time system might also be easier to use than the Haskell run-time system on certain platforms, e.g. Android and iOS.

To install the C run-time system, go to the src/runtime/c directory and use the install.sh script:

  bash setup.sh configure
  bash setup.sh build
  bash setup.sh install

This will install the C header files and libraries need to write C programs that use PGF grammars. Some example C programs are included in the utils subdirectory, e.g. pgf-translate.c.

When the C run-time system is installed, you can install GF with C run-time support by doing

  cabal install -fserver -fc-runtime

from the top directory. This give you three new things:

Python and Java bindings

The C run-time system can also be used from Python and Java. Python and Java bindings are found in the src/runtime/python and src/runtime/java directories, respecively. Compile them by following the instructions in the INSTALL files in those directories.

Creating binary distribution packages

Creating .deb packages for Ubuntu

This was tested on Ubuntu 14.04 for the release of GF 3.6, and the resulting .deb packages appears to work on Ubuntu 12.04, 13.10 and 14.04. For the release of GF 3.7, we generated .deb packages on Ubuntu 15.04 and tested them on Ubuntu 12.04 and 14.04.

Under Ubuntu, Haskell executables are statically linked against other Haskell libraries, so the .deb packages are fairly self-contained.

Preparations

  sudo apt-get install dpkg-dev debhelper

Creating the package

Make sure the debian/changelog starts with an entry that describes the version you are building. Then run

  make deb

If get error messages about missing dependencies (e.g. autoconf, automake, libtool-bin, python-dev, java-sdk, txt2tags) use apt-get intall to install them, then try again.

Creating OS X Installer packages

Run

  make pkg

Creating binary tar distributions

Run

  make bintar

Creating .rpm packages for Fedora

This is possible, but the procedure has not been automated. It involves using the cabal-rpm tool,

  sudo yum install cabal-rpm

and following the Fedora guide How to create an RPM package.

Under Fedora, Haskell executables are dynamically linked against other Haskell libraries, so .rpm packages for all Haskell libraries that GF depends on are required. Most of them are already available in the Fedora distribution, but a few of them might have to be built and distributed along with the GF .rpm package. When building .rpm packages for GF 3.4, we also had to build .rpms for fst and httpd-shed.

Running the testsuite

NOTE: The test suite has not been maintained recently, so expect many tests to fail.

GF has testsuite. It is run with the following command:

  $ cabal test

The testsuite architecture for GF is very simple but still very flexible. GF by itself is an interpreter and could execute commands in batch mode. This is everything that we need to organize a testsuite. The root of the testsuite is the testsuite/ directory. It contains subdirectories which themself contain GF batch files (with extension .gfs). The above command searches the subdirectories of the testsuite/ directory for files with extension .gfs and when it finds one it is executed with the GF interpreter. The output of the script is stored in file with extension .out and is compared with the content of the corresponding file with extension .gold, if there is one. If the contents are identical the command reports that the test was passed successfully. Otherwise the test had failed.

Every time when you make some changes to GF that have to be tested, instead of writing the commands by hand in the GF shell, add them to one .gfs file in the testsuite and run the test. In this way you can use the same test later and we will be sure that we will not incidentaly break your code later.

If you don't want to run the whole testsuite you can write the path to the subdirectory in which you are interested. For example:

  $ cabal test testsuite/compiler

will run only the testsuite for the compiler.