The GF Cloud: web Apps and APIs

GF Summer School 2017
- Thomas Hallgren The wide-coverage translation system
Take a look: Wide Coverage Translation Demo

The wide-coverage translation system

How does it work?

The App grammar: RGL + Phrasebook + Chunking
Things that aren't handled by the grammar:
- Segmentation
- Tokenization
- Capitalization

Segmentation

Done in JavaScript
function split_punct(s) { return s.split(/([.!?]+[ \t\n]+|\n\n+|[ \t\n]*[-•*+#]+[ \t\n]+)/) }
Multiple segments can be sent to the server and translated in parallel
But it seems ad-hoc: it should be part of the grammar...

Tokenization

Done in the server
- .../App14.pgf?command=c-translate&lexer=text&unlexer=text...
Separates punctuation from words
- Parentheses and quotes?

Tokenization

Was necessary before
Nowadays, grammars could be rewritten with the BIND, SOFT_BIND and SOFT_SPACE tokens instead.

Capitalization

Done in the server, as part of the tokenization by the lexer=text
Need to change the first word of a sentence to lower case
- Causes problems if the first word is a name, or in English I, I'm, etc...
- Keep upper case if the word is all caps
- Keep upper case if it is a valid word in the grammar (lookupMorpho)
  - Misses multi-word expressions?

Capitalization

Was necessary before
Nowadays, grammars could be rewritten to use the CAPIT token instead.
- For robustness, maybe there should be a SOFT_CAPIT token too?