ruote tmp/log_2013-03-08.html

2013-03-08 09:07:58 utc jmettraux whitequark: hello and welcome to #ruote
2013-03-08 09:09:15 utc whitequark jmettraux: hello
2013-03-08 09:09:19 utc whitequark I've been reading about neg
2013-03-08 09:10:21 utc whitequark I've tried to implement something similar, but with one important difference: I wished to have both PEG syntax, LR semantics and PEG non-ambiguity
2013-03-08 09:10:39 utc whitequark (I'm fairly sure that at least the first two goals together can be achieved)
2013-03-08 09:11:32 utc whitequark the problem with PEG/packrat parsers is either the exponential time requirement on pathological inputs or enormous space requirements. (I'm not sure about exact complexity, but this is a significant practical problem.)
2013-03-08 09:11:58 utc jmettraux that reminds me of
2013-03-08 09:12:55 utc whitequark jmettraux: interesting. however, parslet/treetop have their own set of problems
2013-03-08 09:12:59 utc jmettraux neg is un-ambitious, it's just a small tool I intend to use to parse small inputs
2013-03-08 09:13:23 utc whitequark for example, creating a leaf for each single character is as not effective as it can get
2013-03-08 09:13:45 utc whitequark I've seen treetop work for seconds on kilobyte-sized inputs. I'm not sure why would anyone want to use any of the ruby PEG parsers for any practical work.
2013-03-08 09:14:41 utc whitequark jmettraux: un-ambitious indeed; it just reminded me of a similarly small parsing library I've tried (and failed) to develop, and I've wanted to share my thoughts
2013-03-08 09:15:00 utc jmettraux ok
2013-03-08 09:17:40 utc jmettraux what do you parse usually?
2013-03-08 09:18:19 utc whitequark jmettraux: that toy project I've been writing was a self-contained SQL database in a single .rb file, storing the data after __END__
2013-03-08 09:18:31 utc jmettraux cool
2013-03-08 09:18:55 utc whitequark however, my unfinished prototypes were quite nice and if they would actually work in all cases I'd write a ruby parser on top of it
2013-03-08 09:19:16 utc whitequark I should elaborate
2013-03-08 09:19:21 utc jmettraux 100% ruby
2013-03-08 09:19:38 utc whitequark jmettraux: yeah
2013-03-08 09:19:55 utc whitequark see, we currently have ruby_parser, it's 100% ruby but is also quite a mess
2013-03-08 09:20:21 utc whitequark the lexer is a nightmare. I rewrote it in Ragel, but the performance degraded 2x and it's not cool
2013-03-08 09:20:49 utc whitequark the reason is that it used to use regexps, and ragel requires its input to be a stream of numbers, all unicode characters mapped to 128 or 255
2013-03-08 09:21:13 utc whitequark the mapping is obviously not very fast, and this is made worse by the fact that oniguruma is optimized as hell and is written in C
2013-03-08 09:21:33 utc whitequark so. even with a proper lexer, we still have the problem that lexer and parser have to communicate bidirectionally
2013-03-08 09:22:14 utc whitequark because {, for example, can mean three things in three different contexts, which a LALR parser cannot disambiguate (I think?), and thus it instead communicates its intent to the lexer
2013-03-08 09:22:21 utc whitequark which emits one of the tokens
2013-03-08 09:22:28 utc whitequark tLBRACE, tLBRACE2 and smth else
2013-03-08 09:22:36 utc whitequark tLCURLY? probably.
2013-03-08 09:23:20 utc whitequark oh, and also lexer has three of its own stack states, and I'm not even going to explain how they work, because I've no idea, and everywhere I've seen LALR ruby parsers they were just cargo cult copied
2013-03-08 09:23:58 utc jmettraux quick parenthese/question (please go back to your initial flow after the answer): your target (post __END__) is to have SQL?
2013-03-08 09:25:01 utc whitequark jmettraux: no, the data storage. see, I could design a way to transactionally add records of pre-known size, then a way to atomically replace older records, then make a directory of a set of records and put it in a record... eventually I'll get to schema, data and indexes
2013-03-08 09:26:09 utc whitequark ok. so. back to ruby
2013-03-08 09:28:08 utc whitequark my toy SQL parser was designed like this: it had a way to describe its grammar, kinda like neg does it, and it precompiled the rules into 'lookahead tables'
2013-03-08 09:28:47 utc whitequark it was basically an attempt at implementing a LALR(n) parser generator, kinda like bison, but with one very important difference: unlike bison, which used fixed tokens, emitted by lexer, and having some data attached to them
2013-03-08 09:29:21 utc whitequark I've had regexps in place of tokens, which could capture groups when they matched, and these groups were passed to the code handling the rule (or the result was just stuffed in AST)
2013-03-08 09:31:32 utc whitequark so it was kind of like the current situation with ruby parser/ruby lexer, but without all the complexity of separate parser/lexer, without slowness of ragel (and its quirks), and without slowness of PEG, as this was just a LALR parser with linear time complexity by input length and linear space complexity by input parse tree depth
2013-03-08 09:33:28 utc whitequark ok, enough talk, time to show the code
2013-03-08 09:36:51 utc whitequark jmettraux:
2013-03-08 09:37:05 utc whitequark it doesn't really work unfortunately :/ It used to be, but I've lost that snapshot
2013-03-08 09:37:09 utc whitequark *used to
2013-03-08 09:42:03 utc jmettraux sweet anyway
2013-03-08 09:43:51 utc jmettraux whitequark: off-topic question, I see you're in the Evil Martians team, do you guys happen to have someone working from Bali? I met him in a Ruby conf in Singapore last year, maybe my memory is bad...
2013-03-08 09:45:01 utc whitequark jmettraux: all over the world
2013-03-08 09:46:01 utc jmettraux ah, I think he's the guy:
2013-03-08 09:46:14 utc whitequark jmettraux: very likely
2013-03-08 09:46:58 utc jmettraux do you plan to give your PicoDB prototype another round of trying?
2013-03-08 09:48:42 utc whitequark jmettraux: that is quite likely
2013-03-08 09:48:57 utc whitequark unfortunately I've no idea how to fix the parser
2013-03-08 09:49:17 utc whitequark it would be really neat if someone helped me with it
2013-03-08 09:50:12 utc jmettraux sorry, neg is too toyesque
2013-03-08 09:51:01 utc jmettraux or like, sorry that neg is too toyesque
2013-03-08 09:55:00 utc whitequark that's a pity
2013-03-08 09:55:06 utc whitequark maybe I should just study bison better...
2013-03-08 10:00:40 utc jmettraux especially if you want the perf
2013-03-08 10:02:21 utc whitequark jmettraux: no, I'm not going to *use* it
2013-03-08 10:02:42 utc whitequark or, well, I already do--the current ruby_parser is written with racc--but I'm not happy about that
2013-03-08 10:02:51 utc whitequark it is very poorly suited for parsing ruby
2013-03-08 10:06:37 utc jmettraux internal dsl instead of sql? sql-like internal dsl?
2013-03-08 10:08:01 utc whitequark jmettraux: hm?
2013-03-08 10:08:11 utc whitequark picodb is an SQL database; it accepts SQL
2013-03-08 10:08:39 utc whitequark the parser generator of picodb is suited for parsing ruby better than bison, thus, if it would work, I'd use it also for parsing ruby
2013-03-08 10:09:57 utc jmettraux why do you need to parse ruby?
2013-03-08 10:10:09 utc whitequark jmettraux:
2013-03-08 10:10:46 utc whitequark ruby_parser is, well, crap. it doesn't report column numbers, line numbers are, quote, slightly off, unquote, and it dies on some pathological (for it) inputs
2013-03-08 10:21:18 utc jmettraux thanks for explaining it all
2013-03-08 10:26:03 utc jmettraux any news from that Jacob guy working on a Ruby AST builder on top of Ripper, the one who commented in ?
2013-03-08 10:26:57 utc whitequark jmettraux: no
2013-03-08 10:27:05 utc whitequark why would you want to use ripper?
2013-03-08 10:27:18 utc jmettraux 1.9
2013-03-08 10:27:23 utc whitequark so?
2013-03-08 10:27:34 utc jmettraux comes with the beast
2013-03-08 10:27:38 utc whitequark so?
2013-03-08 10:27:57 utc whitequark ripper does not detect errors
2013-03-08 10:28:03 utc jmettraux it's more tested than ruby_parser
2013-03-08 10:28:04 utc whitequark and is unportable
2013-03-08 10:28:14 utc whitequark it _doesn't work_.
2013-03-08 10:28:22 utc whitequark because a parser which cannot report an error is not a parser, it's bullshit
2013-03-08 10:28:30 utc jmettraux my hope is that, if it's 1.9, other rubies will provide it somehow
2013-03-08 10:28:35 utc whitequark no
2013-03-08 10:28:50 utc whitequark this is not going to happen. ripper is internal undocumented API which depends on implementation details.
2013-03-08 10:28:58 utc jmettraux great
2013-03-08 10:29:38 utc whitequark use RP. I'm working first and foremost on improving RP.
2013-03-08 10:30:32 utc jmettraux I'm using it here and there, I have to find time to move those bits and pieces to the latest and greatest RP
2013-03-08 10:31:13 utc jmettraux this roughness in the ride keeps my eyes open to alternatives
2013-03-08 10:31:34 utc whitequark well, ripper is far worse than RP anyway
2013-03-08 10:31:49 utc whitequark you could check out JRubyParser, as headius suggests
2013-03-08 10:32:58 utc jmettraux so you're joining the RP dev team?
2013-03-08 10:33:11 utc whitequark jmettraux: hell no. zenspider is a dick
2013-03-08 10:33:20 utc whitequark if he merges my changes, good
2013-03-08 10:33:25 utc whitequark if he doesn't, there will be a fork
2013-03-08 10:34:07 utc whitequark well, there will be a foundry-internal fork anyway, but that is only tangentially related
2013-03-08 10:34:13 utc whitequark (foundry = that language I'm developing)
2013-03-08 10:34:28 utc whitequark (it has to add type annotations, etc)
2013-03-08 10:34:39 utc jmettraux gloomy ruby parsing landscape
2013-03-08 10:35:05 utc whitequark parsing is, well, ghetto
2013-03-08 10:35:12 utc whitequark at least you don't have to deal with C++
2013-03-08 10:35:30 utc whitequark a C++ compiler spends something like 70% of its time parsing the source
2013-03-08 10:35:36 utc whitequark *runtime
2013-03-08 10:45:31 utc jmettraux I hope you'll find the time to release and maintain your fork of ruby_parser
2013-03-08 10:48:57 utc whitequark jmettraux: that is likely, considering I'll finish it at all (I probably will)
2013-03-08 11:45:13 utc jmettraux whitequark: so, all being said, I wish you a good evening!
2013-03-08 11:46:54 utc whitequark jmettraux: thanks!