[02:32:55] *** Joins: spectre (~fran@115.114.202.84.customer.cdi.no) [06:04:28] inariksit: could you take a look at this? [06:04:30] http://paste2.org/_2DaYAJ2F [06:05:06] i've ported the morph analyser to the apertium stream format, made it run in the shell [06:05:29] works with arguments, pipes and stdin [06:05:51] any suggestions? the code is probably pretty untidy [07:33:18] looks fine :) you can also do case analysis on args [07:33:23] e.g. case args of [07:33:39] [p, l] -> do ... [07:33:52] [p, l, s] -> do ... [07:34:00] then you have all options on the same level [07:35:29] nice, I didn't think of that :D [07:35:35] functors and applicatives look so pretty [07:35:38] ^^ [07:36:16] what do you think about working on morph disambiguation by rejecting all analyses that don't have a parse tree? [07:38:52] so what's the purpose? to make a tool that apertium users could use and it would output apertium resources from gf resources? [07:39:05] yep [07:39:10] wait, i'll link you.. [07:39:39] http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Integration_and_debugging_tools_for_Grammatical_Framework [07:40:44] right, yeah [07:40:52] that's the coding challenge you're doing right now? [07:40:58] yeah [07:41:16] yeah [07:42:56] you were planning to mentor this one, weren't you? [07:43:33] yep :-D should get some ideas myself too [07:43:58] but anyway, for your question, that's worth trying :) and I can ask other people here too if they have ideas [07:44:10] awesome :D i think spectre had mentioned that he wanted to port more tools than were mentioned on the ideas page [07:44:14] cool, i'll give it a shot [07:44:17] I did mention it (and aarne is following the mailing list) [07:44:30] aarne is such an apertium fan, he must have some ideas already [07:44:45] haha [07:44:59] and yeah, if we are not at the moment having problem of ideas where to choose, I think implementing that is a good idea [07:45:10] GF was his idea, wasn't it? [07:45:14] btw I don't know if this is helpful, but https://groups.google.com/forum/#!searchin/gf-dev/C$20library/gf-dev/jNrUTdBbq7Y/Zdvm0HKmPqgJ [07:45:20] aarne's? yes :) [07:45:34] cool [07:46:41] nice, that looks helpful, inariksit [07:46:46] i'll take a closer look when I get home [07:47:13] :) [07:47:31] also from your code, (\acc x -> acc ++ x) can be replaced with just (++) [07:47:40] hm? [07:47:43] oh yeah [07:47:44] my bad [07:47:53] thanks :) [07:52:02] http://paste2.org/_eb6FImZ2 [07:52:09] with a big grammar :-D [07:52:49] you can see some rubbish analyses [07:52:51] ^is/thirsty_VP [07:53:07] the copula is a member of many different constructions [07:53:11] there's this module Construction.gf [07:53:30] with stuff like being thirsty or hungry, or weather expressions, as VPs [07:53:40] because they are expressed differently in different languages [07:54:38] haha and also chunks [07:54:39] CleftAdv<(ODir>/copula_Chunk [07:54:48] (sorry, just the last bit is relevant) [07:54:56] haha, oh dear [07:55:28] where's the weather bit coming from? [07:56:12] oh, I see [07:56:15] go to lib/src/english/Construction.gf, there are stuff like [07:56:16] weather_adjCl ap = mkCl (mkVP (lin AP ap)) ; [07:56:16] [07:56:16] is_right_VP = mkVP (ParadigmsEng.mkA "right") ; [07:56:16] is_wrong_VP = mkVP (ParadigmsEng.mkA "wrong") ; [07:56:38] (mkVP with just complement constructs by default a copula clause with that complement) [07:57:16] this is for the "to be" clause, is it? [07:57:16] but you can exclude that module (or any module) easily, just comment it out in the top level file [07:57:19] yes [07:57:34] okay, this makes sense (I think) [07:57:41] e.g. TranslateEng starts like this [07:57:41] concrete TranslateEng of Translate = TenseX - [Pol, PNeg, PPos], CatEng, [07:57:54] so just go there and comment out ConstructionEng [07:57:59] then it compiles it without it [07:58:15] also I don't know if the chunks are going to be a problem, but you can also comment out chunks [07:58:31] or just make your own DisamMorphoEng, where you include the modules you want [07:58:32] and Construction specifically works to build thirst/hunger expressions and the like? [07:59:21] yeah, or other expressions that are often different in different languages [07:59:27] apertium delimits chunks with {} [07:59:31] so I could do that, I suppose [07:59:58] so that an expression gets parsed in one language as a VP, instead of a combination of V and NP [08:00:10] and then it can be translated non-literally [08:00:17] okay, I am thirsty = ich habe durst in German. so you mean the sort of expressions where the verb changes/becomes another POS? [08:00:29] yes [08:00:34] or the whole expression [08:01:00] we had a nice example, in english " is a few sandwiches short of a picnic" [08:01:50] and in finnish there is an expression " doesn't have all the sandwiches in a picnic" (well, the canonical example doesn't use sandwich and picnic, but just to keep up with the scheme) [08:02:09] and then we parameterised the person who the idiom is talking about, the sandwich and the picnic [08:02:48] ooh [08:02:52] and transferred the expression [08:03:38] yeah [08:03:46] nice :D [08:03:46] so you can have complete nonsense like [08:03:47] Lang> p "John is a few babies short of a car" [08:03:48] PhrUtt NoPConj (UttS (few_X_short_of_Y (UsePN john_PN) (UseN baby_N) (UseN car_N))) NoVoc [08:04:09] and the finnish translation would turn out as "John doesn't have all the babies in his car" [08:04:15] damn [08:04:47] this is kind of/sort of/very distantly similar to paradigms, I suppose [08:04:54] this construction needs to be a S, because negation happens so late in the tree [08:05:10] because in the expression in english, it's positive but in finnish negative [08:05:11] in the sense that there are specific constructions for certain expressions [08:05:18] "is short of" vs. "doesn't have" [08:05:25] I see [08:05:27] what do you mean by paradigms= [08:05:40] paradigms in apertium, I mean [08:05:47] ah ok, I don't know those [08:05:57] what are they? [08:06:38] well, they're tags that specify how certain words inflect. then once you have the inflection patterns for a certain word, you can make another word that inflects similarly use the same pattern [08:06:49] it's similar in the sense that there are 'unique' paradigms [08:07:13] and there seem to be unique constructions here - like few_X_short_of_Y [08:09:53] ah ok, so the apertium paradigm is like a paradigm in the linguistic sense? :) [08:10:32] I'm trying to think if that analogy makes sense; so a paradigm takes as an input a word form and produces an inflection table [08:10:38] it's a way to pack information [08:11:04] you can say "dog", regular paradigm, and not need to say "dog, dogs, dog's, dogs'" [08:11:09] yep, it's a pretty shitty analogy :P [08:11:16] yeah [08:11:35] okay, brb [08:11:38] ok :) [08:11:40] thanks for the help, inariksit :) [08:11:45] no problem! [09:40:38] *** Quits: jmvanel (~jmvanel@243.0.88.79.rev.sfr.net) (Read error: Connection reset by peer) [10:13:56] *** Joins: jmvanel (~jmvanel@78.193.21.40) [12:31:02] Lang> p "John is a few babies short of a car" [12:31:02] :D [12:31:40] vin-ivar, sounds like you're on the right track [12:32:29] what do you think about working on morph disambiguation by rejecting all analyses that don't have a parse tree? [12:33:09] the idea of this tool is to be able to get an idea of how well GF grammars perform on the morphological disambiguation task, e.g. if you can use the grammars to do morphological disambiguation [12:33:58] to provide a baseline for other stuff you might want to do [12:41:12] cool [12:41:23] i'll draw up something ghetto today [12:41:44] :D [13:33:07] sounds good [13:33:13] * inariksit in lab supervision again [13:34:03] btw spectre or anyone, if you're interested to try my CG implementation, here's something that should work with just cabal configure && cabal build https://github.com/inariksit/cgsat [13:34:15] ie. no magical symlinks needed :-P [13:34:18] :) [13:34:47] (though for any practical real world data, be warned: token Punct ["!?.,:"] ) [13:34:54] hmm, is that the same as "cgstuff" ? [13:34:57] yeah [13:35:02] just removed the stuff [13:35:10] Warning: The package list for 'hackage.haskell.org' is 149 days old. [13:35:10] D: [13:35:14] ok [13:35:21] installing minisat [13:35:52] $ cabal build [13:35:52] Building cgsat-0.1.0.0... [13:35:52] Preprocessing library cgsat-0.1.0.0... [13:35:52] cabal: can't find source for BNFC/ErrM in ., dist/build/autogen [13:35:59] grrr [13:36:06] lowercase problem? [13:36:12] no [13:36:21] $ ln -s bnfc/ BNFC [13:36:22] i did this [13:36:24] now it's building :P [13:36:27] hahah ok [13:36:29] shit shit [13:36:30] :D [13:36:31] :D [13:36:39] +1 magical symlink :P [13:36:55] I tried to git mv bnfc BNFC but it said that source and target are the same [13:36:59] so I was just like ok whatever [13:37:01] and it worked [13:37:22] hmm, so how does it work ? [13:37:29] also I should make some fancier makefile so that if I update the .cf files, it would just build modules called BNFC.Foo instead of Foo [13:37:37] because it's stupid to store autogenerated files >_> [13:37:44] $ dist/build/cgsat/cgsat [13:37:44] ? [13:37:48] yeah [13:37:54] if you give "test" as argument [13:38:09] it will disambiguate (or fail in doing so) some simple "the bear sleeps in the house" sentences [13:38:13] no solution ;__; [13:38:16] ;___; [13:38:35] if you give it a rules file and an apertium morpho tagged file (examples in data/), it parses them and disambiguates [13:38:42] $ dist/build/cgsat/cgsat data/hun_cg2.rlx data/morph-output.txt [13:38:44] ah also, I haven't implemented LINK rules yet [13:38:48] ...hargle bargle.. [13:38:50] so that's probably one reason [13:38:51] Remove [["anyjuk"]] (POS (C (Barrier (-1) [[]]) (False,[[]]))) [13:38:51] No solution [13:38:51] ----------- [13:38:55] ah shit that's my debug output [13:39:08] ah [13:39:27] those No solution stuffs probably are due to LINK rules not being implemented and the actual stuffs using them [13:39:46] also, the hungarian files in data/ have unicode characters carefully removed [13:39:52] D: [13:39:55] haha [13:39:57] i was just going to suggest [13:40:00] adding a russian example [13:40:03] but if no unicode then D: [13:40:13] well that seemed to be a problem of bnfc [13:40:20] of course I can parse them in a million other ways [13:40:31] s/I can/one could/ [13:41:13] I imagine though that there might be some magical way to get bnfc understand unicode [13:41:17] yeah [13:41:21] it's probably a flag [13:41:23] like -utf8 [13:41:25] or something [13:41:30] yeah sounds plausible [13:52:34] googling "bnfc unicode" suggests that bnfc is ok but some lexer tools that bnfc uses, don't [13:52:40] so then I was googling "happy unicode" [13:52:42] :D [13:52:48] http://unicodeemoticons.com/ [13:52:51] first result [13:52:55] ◕‿◕ [13:52:57] i like this one [13:53:09] o hei inari (づ。◕‿‿◕。)づ [13:53:20] ohaio! ^__^ [13:53:36] 凸(-_-)凸 <-- this one is flammie [14:10:32] spectre: haha [14:11:41] ^__^ [15:53:59] *** Quits: spectre (~fran@115.114.202.84.customer.cdi.no) (Ping timeout: 245 seconds) [15:54:28] the fuck [15:54:29] hahaha [15:54:50] okay, inariksit, any suggestions for a grammar more complicated than Foods but less complicated than Translator? :P [15:56:19] just the basic Lang [15:56:25] with possibly Construction excluded [15:56:30] or take some mini resource [15:56:56] e.g. here https://github.com/GrammaticalFramework/gf-contrib/tree/master/extmini [16:11:17] nice, cheers [16:11:26] i'll get to work in a bit [16:38:07] *** Joins: spectre (~fran@dhcp856-ans.wifi.uit.no) [16:59:19] *** Quits: spectre (~fran@dhcp856-ans.wifi.uit.no) (Ping timeout: 250 seconds) [17:03:29] *** Joins: spectre (~fran@dhcp856-ans.wifi.uit.no) [17:20:08] *** Quits: spectre (~fran@dhcp856-ans.wifi.uit.no) (Ping timeout: 246 seconds) [18:20:47] *** Quits: crazydiamond (~crazydiam@178.141.72.109) (Ping timeout: 250 seconds) [19:20:39] *** Joins: spectre (~fran@129.242.94.208) [19:32:11] inariksit: lexical error on the ò of sarò in ResIta.gf [19:32:34] but there isn't one for avrò [19:32:40] what do? [19:32:56] oh, weird [19:34:36] you can look the codepoint of a character in emacs by C-x = [19:35:20] yep they're different [19:36:26] are they? tried replacing it, but no luck [19:36:32] wait, let me check [19:37:02] argh sorry no :-D I had my cursor on the " on the other one [19:37:42] so if you only replace sarò with something else, it doesn't complain about avrò? [19:37:51] or just that with both in place, it only complains about one? [19:38:01] that's normal in gf, it just stops working after finding one error [19:38:05] even though there are more [19:38:32] well [19:38:34] it's a bit complicated [19:38:58] intially, it was mkVerb and a bunch of verb - split across two lines [19:39:07] verbs [19:39:15] and the error was at the end of the first line [19:39:26] it just said "lexical error" [19:39:29] so I combined the two lines [19:39:35] then the error was on the ò [19:39:38] replaced it with o [19:39:45] now the error's moved to the end of the line again [19:39:47] D: [19:39:54] it works for me when I saved all the files as utf-8 [19:40:08] in emacs: C-x f [19:40:15] and then type utf-8 [19:40:28] vim master race [19:40:30] but okay [19:40:33] let me install emacs :P [19:40:44] lol "vim master race" :P [19:40:47] what's the prob ? [19:40:49] haha [19:40:57] you know it's true [19:41:04] i use vim >_> [19:41:12] hehe, I'm sure vim can do all stuff too, but I am only familiar with emacs [19:41:14] o/ [19:41:16] anyway, what am i doing ? [19:41:19] you have an err-or ? [19:41:41] oh shit, inariksit, svn doesn't mess with encodings if you use it to checkout from github, does it? [19:41:44] indeed [19:42:30] I don't know that [19:42:38] but many of the files in gf-contrib are old [19:42:40] vin-ivar, it shouldn't do [19:43:01] hmm, latest darcs doesn't compile for me in lib/src/italian [19:43:06] but not encoding related [19:43:20] ah sorry I think we're talking about the extmini in gf-contrib [19:43:27] not the actual italian resource grammar [19:43:37] we are [19:43:40] yeah [19:43:43] ah ok [19:43:49] spectre, https://github.com/GrammaticalFramework/gf-contrib/tree/master/extmini [19:44:00] hmm [19:44:05] 'sec [19:44:51] ok [19:44:54] now i get the error [19:45:13] missing ';' ? [19:45:19] added it [19:45:19] ah no [19:45:21] no luck [19:45:38] I also got lexical error before converting all 3 *Ita.gf files in utf-8 [19:45:40] it's weird because there's a mkVerb above it that looks exactly the same [19:45:48] but no error D: [19:46:04] oh yeah [19:46:07] they're in iso [19:46:24] I'm quite sure that both would cause error, it just doesn't show the second error after finding one [19:46:36] now it works [19:46:37] vin-ivar, [19:46:38] $ for i in *.gf; do cat $i | iconv -f latin1 -t utf-8 > $i~ ; mv $i~ $i; done [19:46:39] run this [19:46:51] but if you've faffed around with anything [19:46:56] you'll probably want to check it out from scratch [19:46:57] command line > emacs [19:47:04] ^_^ [19:47:07] git checkout -- *Ita.gf [19:47:08] bash for loops \o/ [19:48:57] nice, I hadn't heard of iconv [19:49:01] checking the whole thing out again [19:49:09] iconv + uconv ++ [19:49:26] pain in the ass on a 512k connection T_T [19:49:33] just do git checkout -- *Ita.gf [19:49:38] yeah [19:50:25] it works! [19:50:27] \o/ [19:50:32] fuckin hi-5 o/ [19:50:33] \o/ [19:50:38] \o [19:50:40] cheers spectre, thanks [19:58:24] *** Quits: spectre (~fran@129.242.94.208) (Ping timeout: 256 seconds) [20:15:11] *** Quits: jmvanel (~jmvanel@78.193.21.40) (Ping timeout: 250 seconds) [20:44:39] *** Joins: jmvanel (~jmvanel@66.0.88.79.rev.sfr.net) [20:51:17] *** Joins: spectre (~fran@193.212.24.132) [21:26:19] *** Quits: spectre (~fran@193.212.24.132) (Read error: Connection reset by peer) [21:26:40] *** Joins: spectei (~fran@unaffiliated/spectie) [21:32:08] *** Quits: spectei (~fran@unaffiliated/spectie) (Ping timeout: 272 seconds) [21:33:10] *** Joins: crazydiamond (~crazydiam@178.141.71.207)