[02:39:34] *** Joins: drbean (~drbean@124.219.83.228) [04:14:24] *** Quits: jmvanel (~jmvanel@79.88.0.66) (Ping timeout: 245 seconds) [08:37:56] *** Joins: jmvanel (~jmvanel@66.0.88.79.rev.sfr.net) [09:02:06] *** Joins: crazydiamond (~crazydiam@178.141.72.158) [09:51:16] *** Quits: vin-ivar (~vinit@122.170.53.73) (Quit: WeeChat 1.0.1) [09:56:26] *** Joins: vin-ivar (~vinit@122.170.53.73) [10:10:05] phew, fixed everything. had to go thermonuclear on ghc [10:33:58] *** Quits: crazydiamond (~crazydiam@178.141.72.158) (Remote host closed the connection) [12:08:02] *** Quits: spectre (~fran@115.114.202.84.customer.cdi.no) (Ping timeout: 252 seconds) [17:21:49] *** Joins: spectre (~fran@115.114.202.84.customer.cdi.no) [17:43:35] *** Joins: vinit-ivar (~vinit@122.169.12.162) [17:45:56] *** Quits: vin-ivar (~vinit@122.170.53.73) (Ping timeout: 264 seconds) [20:08:55] *** vinit-ivar is now known as vin-ivar [20:09:04] folks [20:09:08] ~/Dev/FOSS/GF-master/extmini/gf-apertium master $ echo "I love the sleep" | ./MorphAnalyser.hs ../Lang.pgf LangEng | ./MorphDisambiguator.hs ../Lang.pgf LangEng [20:09:10] MorphAnalyser.hs: : hGetLine: end of file [20:09:12] ^I/i_NP$ ^love/love_V2$ ^the/the_Det$ ^sleep/sleep_N$ [20:09:14] MorphDisambiguator.hs: : hGetLine: end of file [20:09:26] \o/ [20:10:19] i split the thing into two scripts and tidied it up a bit, the disambiguator converts the apertium stream format to the original string [20:34:47] spectre: what d'you reckon I should work on next? lexical selection? [20:35:26] can you try it with another language [20:35:28] to see what happens [20:35:37] sure [20:35:44] e.g. do we get english (interlingua) in the output [20:35:44] gimme a bit [20:35:47] or original lang [20:46:44] you get the analyses in english [20:47:10] ok [20:47:27] that's how the lexicon seems to be defined anyway [20:47:35] echo "I love the sleep" | ./MorphAnalyser.hs ../Lang.pgf LangEng [20:47:39] what do you get out of this ? [20:47:41] like, sleep_N = mkN "sonno" ; [20:47:42] do you get multiple lemmas ? [20:47:48] yeah [20:47:50] for sleep [20:47:54] I added sleep as a noun [20:48:19] the lexical selection would be interesting [20:48:22] you can have e.g. [20:48:28] sleep_N = mkN "sonno" | mkN "hargle" ; [20:48:41] ^I/i_NP$ ^love/love_V2$ ^the/the_Det$ ^sleep/sleep_V/sleep_N$ [20:48:41] and see if you can get it to output >1 translation [20:48:48] yeah [20:48:50] it sounds interesting [20:49:00] isn't lexical selection handled statistically? [20:49:28] so [20:49:30] you can have e.g. [20:49:39] sleep_N = mkN "sonno" | mkN "hargle" ; (ambiguous in SL generation) [20:49:40] or [20:49:48] sleep1_N = mkN "sonno" ; [20:49:52] sleep2_N = mkN "sonno" ; [20:49:52] [20:49:59] (ambiguous in SL analysis) [20:50:03] i don't really know how it works [20:50:13] lexical selection could be to do with both [20:50:18] i think i had used the latter at one point [20:50:27] although the ideal is obviously that the interlingua contains all the concepts [20:50:34] to be able to distinguish the sentences [20:50:47] and only pure synonyms (lol kilgarrif there are no synonyms) are with the | way [20:51:31] brb [20:52:01] hm, how would the interlingua store concepts related to lexical selection, though [20:52:17] it wouldn't [20:52:23] lexical selection would then be part of parsing [20:52:26] i guess [20:52:49] you would have to e.g. alter the parse tree probs (or have some other disambiguation module) [20:52:55] to give [20:53:01] right, so there'd have to be some sort of information about what word sense to use. how do you do that without probabilities or rules? [20:53:01] a score for [20:53:10] p(estación|station_N) [20:53:13] p(estación|season_N) [20:53:15] yeah, that's what i was thinking [20:53:24] because if you have [20:53:37] station_N = mkN "estació" ; [20:53:44] season_N = mkN "estació" ; [20:53:47] i'm not sure how it works at the moment [20:54:00] probably it is just conditioned on the probabilities of station_N and season_N in the penntreebank [20:54:02] it could be contextual, I suppose [20:54:10] yes, that's the idea of lexical selection [20:54:11] more helpful than just unigram probabilities [20:54:15] right [20:54:22] that you use contexts to modify the parse tree probabilities [20:54:23] or something [20:54:30] aye [20:54:37] e.g. if you have estació seca [20:54:52] then you bump estación->season_N [20:54:58] but if you have estació de tren [20:55:06] then you bump estació->station_N [20:55:08] but then you also need corpora to get probabilities [20:55:16] though I suppose I could just ghetto my own for starters [20:55:19] vin-ivar, you can do it either supervised (if you have parallel corpora) [20:55:23] or unsupervised (if you don't) [20:55:34] msg me your email addr [20:55:39] okay [20:56:43] will send you a paper [20:57:03] sent [20:57:28] got it [20:57:42] looks nice [20:58:03] bbiab :) [20:58:11] cool [21:20:20] spectre: i'll take a look at the the paper tomorrow, about to crash now [21:20:29] thanks, it looks really cool :D [22:03:20] *** Quits: vin-ivar (~vinit@122.169.12.162) (Ping timeout: 264 seconds) [23:47:59] *** Joins: crazydiamond (~crazydiam@178.141.71.146)