Checking GF Translation Dictionaries

Aarne Ranta
May 2015

News

28/5/2015 minor corrections in the text

9/5 Link to the current status: https://docs.google.com/spreadsheets/d/1NuLRp86UPjd298LxjhCAGlHsoPypxKpcBJfDab0De90/edit#gid=0

9/5/2014 Removed many bogus subcat's revealed by dictionary authors and by FrameNet. Please upgrade your TopDictionary from darcs or github!

Call for contributions: the generic translation dictionaries of GF

Wanted: manual checking of TopDictionary???.gf files in this directory.

Abstract syntax: TopDictionary, the top-7000 English words from British National Corpus, as sorted by frequency here.

Usage: part of the general translation dictionaries, used for instance in the GF translation demo. The full dictionaties are the Dictionary* modules in the parent directory.

Who: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.

How to do it

Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.

  1. Make sure to download the latest version of the file.
  2. Make sure you can compile the original file:
        gf ToCheckFre.gf +RTS -K64M
    
  3. Edit the lin rules line by line, starting from the beginning. Follow the guidelines in the next section.
  4. Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
  5. Put, as the first line of the file, a comment indicating your last edited rule:
        ---- checked by AR till once_Adv in the BNC order
    
  6. Make sure the resulting file compiles again.
  7. Perform diff with the old and the new file, just to make sure your changes look reasonable.
  8. Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case, it is enough to send those lin rules that you have processed.
  9. Inform the gf-dev list that you have done this.

A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work, don't spend more than one day on a batch of work before submitting it.

The already split senses are explained here.

Guidelines

When editing a lin rule, do one of the following:

As general guidelines,