# Response from the Grammatical Framework community

Hello,

The Grammatical Framework (GF) community has been following the development of Abstract Wikipedia with great interest.
This message is based on [https://groups.google.com/g/gf-dev/c/A6lNwZ813b0 a thread at GF mailing list] and
my (https://meta.wikimedia.org/wiki/User:Inariksit) personal opinions.

## Resources

GF has a [http://www.grammaticalframework.org/lib/doc/synopsis/index.html Resource Grammar Library (RGL)]
for 40 or so languages, and 14 of them have
[https://github.com/GrammaticalFramework/gf-wordnet#readme large-scale lexical resources and extensions for wide-coverage parsing].
The company [https://www.digitalgrammars.com Digital Grammars] (my employer) has been using GF in commercial applications since 2014. 

To quote GF's inventor Aarne Ranta on [https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/eEyLYmfmCQAJ the previously linked thread]:

<blockquote>
My suggestion would have have a few items:

- that we develop a high-level API for the purpose, as done in many other NLG projects
- that we make a case study on an area or some areas where there is adequate data. For instance from OpenMath
- that we propagate this as a community challenge
- Digital Grammars can sponsor this with some tools, since we have gained experience from some larger-scale NLG projects
</blockquote>

### Division of labour

I personally would love to spend the next couple of years reading grammar books and encoding basic morphological and syntactic structures of languages like Guarani or Greenlandic into the GF RGL. With those in place, a much wider audience can write _application grammars_, using the RGL via a high-level API.

TODO: promises that the GF community is into this and not just me :-P

### Addressing some concerns from this talk page

Some of the concerns on the talk page are definitely valid.
* It *is* going to take a lot of time. Developing the GF Resource Grammar Library has taken 20 calendar years and (at least) 20 person years. I think everyone who has a say in the choice of renderer implementation should get familiar with the field---check out other grammar formalisms, like [http://moin.delph-in.net/GrammarCatalogue HPSG], you'll see similar coverage to GF, but no unified API for different languages.
* The "Uzbek uncle" situation happens often with GF grammars when adding a new language or new concepts. Since this happens often, we are prepared for it. There are constructions in the GF language and module system that make dealing with this acceptable.
* "Incompatible cultural facts" is a minefield of its own, far beyond the scope of NLG. I personally think we should start with a case study for a limited domain

On the other hand, worrying about things like ergativity or when to use subjunctive tells me that the commenters haven't understood just how _abstract_ an abstract syntax can be. To illustrate this, let me quote the [http://www.molto-project.eu/sites/default/files/MOLTO_D2.3.pdf Best Practices document] on page 9:

<blockquote>
__Linguistic knowledge.__ Even the most trivial natural language grammars involve expert linguistic knowledge.
In the current example, we have, for instance, word inflection and gender agreement shown in French:
_le bar est ouvert_ (“the bar is open”, masculine) vs. _la gare est ouverte_ (“the station is open”, feminine).
As Step 3 in Figure 3 shows, the change of the noun (bartogare) causes an automatic change of the definite article
(_le_ to _la_) and the adjective (_ouvert_ to _ouverte_).
Yet there is no place in the grammar code (Figure 2) that says anything about gender or agreement, and no occurrence
of the words _la_, _le_, _ouverte_! The reason is that such linguistic details are inherited from a library,
the GF Resource Grammar Library (RGL). The RGL guarantees that application programmers can write their grammars
on a high level of abstraction, and with a confidence of getting the linguistic details automatically right.

__Language differences.__   The RGL takes care of the rendering of linguistic structures in different languages. [--]
The renderings are different in different languages,  so that e.g. the French definition of the constant _the_Det_
produces a word whose form depends on the noun, whereas Finnish produces no article word at  all. These  variations,
which  are  determined  by  the  grammar  of  each  language,  are  automatically created by the RGL.
However, the example also shows another kind of variation: English and French use adjectives to express “open” and “closed”,
whereas Finnish uses adverbs. This variation is chosen by the grammarian, by picking different RGL types and categories
for the same abstract syntax concepts.
</blockquote>

Obviously the GF RGL is far from covering all possible things people might want to say in a wikipedia article.
But an incomplete tool that covers the most common use cases, or covers a single domain well, is still very useful.

### Non-European and underrepresented languages

Regarding discussions such as https://lists.wikimedia.org/pipermail/wikimedia-l/2020-August/095399.html,
I'm happy to see that you are interested in underrepresented languages.
The GF community has members in South Africa, Uganda and Kenya, doing or having previously done work on Bantu languages.
At the moment (September 2020), there is ongoing development in Zulu, Xhosa and Northern Sotho.

This grammar work has been used in a healthcare application, and you can find a link in
[https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/2h293GzwCgAJ this message]. 

If any of these sounds interesting to you, we can start a direct dialogue with the people involved.

### Morphological resources from Wiktionary inflection tables

With work of [http://www.lrec-conf.org/proceedings/lrec2016/pdf/1134_Paper.pdf Forsberg and Hulden] and
[https://github.com/keeleleek/pextract2gf-votic Kankainen], it's possible to extract GF resources from
Wiktionary inflection tables.

Quoting [https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/FbDFsFfUCgAJ Kristian Kankainen's message]:

> Since the Wiktionary is a popular place for inflection tables, these could be used for boot-strapping
GF resources for those languages. Moreover, but not related to GF nor Abstract Wikipedia, the
master's thesis generates also FST code and integrates the language into the Giella platform
which provides an automatically derived simple spell-checker for the language contained in the inflection tables.
> Coupling or "boot-strapping" the GF development using available data on Wiktionary could be seen as
a nice touch and would maybe be seen as a positive inter-coupling of different Wikimedia projects.