| 2013-03-08 09:07:58 utc | jmettraux | whitequark: hello and welcome to #ruote |
| 2013-03-08 09:09:15 utc | whitequark | jmettraux: hello |
| 2013-03-08 09:09:19 utc | whitequark | I've been reading about neg |
| 2013-03-08 09:10:21 utc | whitequark | I've tried to implement something similar, but with one important difference: I wished to have both PEG syntax, LR semantics and PEG non-ambiguity |
| 2013-03-08 09:10:39 utc | whitequark | (I'm fairly sure that at least the first two goals together can be achieved) |
| 2013-03-08 09:11:32 utc | whitequark | the problem with PEG/packrat parsers is either the exponential time requirement on pathological inputs or enormous space requirements. (I'm not sure about exact complexity, but this is a significant practical problem.) |
| 2013-03-08 09:11:58 utc | jmettraux | that reminds me of https://github.com/kschiess/parslet/pull/51 |
| 2013-03-08 09:12:55 utc | whitequark | jmettraux: interesting. however, parslet/treetop have their own set of problems |
| 2013-03-08 09:12:59 utc | jmettraux | neg is un-ambitious, it's just a small tool I intend to use to parse small inputs |
| 2013-03-08 09:13:23 utc | whitequark | for example, creating a leaf for each single character is as not effective as it can get |
| 2013-03-08 09:13:45 utc | whitequark | I've seen treetop work for seconds on kilobyte-sized inputs. I'm not sure why would anyone want to use any of the ruby PEG parsers for any practical work. |
| 2013-03-08 09:14:41 utc | whitequark | jmettraux: un-ambitious indeed; it just reminded me of a similarly small parsing library I've tried (and failed) to develop, and I've wanted to share my thoughts |
| 2013-03-08 09:15:00 utc | jmettraux | ok |
| 2013-03-08 09:17:40 utc | jmettraux | what do you parse usually? |
| 2013-03-08 09:18:19 utc | whitequark | jmettraux: that toy project I've been writing was a self-contained SQL database in a single .rb file, storing the data after __END__ |
| 2013-03-08 09:18:31 utc | jmettraux | cool |
| 2013-03-08 09:18:55 utc | whitequark | however, my unfinished prototypes were quite nice and if they would actually work in all cases I'd write a ruby parser on top of it |
| 2013-03-08 09:19:16 utc | whitequark | I should elaborate |
| 2013-03-08 09:19:21 utc | jmettraux | 100% ruby |
| 2013-03-08 09:19:38 utc | whitequark | jmettraux: yeah |
| 2013-03-08 09:19:55 utc | whitequark | see, we currently have ruby_parser, it's 100% ruby but is also quite a mess |
| 2013-03-08 09:20:21 utc | whitequark | the lexer is a nightmare. I rewrote it in Ragel, but the performance degraded 2x and it's not cool |
| 2013-03-08 09:20:49 utc | whitequark | the reason is that it used to use regexps, and ragel requires its input to be a stream of numbers, all unicode characters mapped to 128 or 255 |
| 2013-03-08 09:21:13 utc | whitequark | the mapping is obviously not very fast, and this is made worse by the fact that oniguruma is optimized as hell and is written in C |
| 2013-03-08 09:21:33 utc | whitequark | so. even with a proper lexer, we still have the problem that lexer and parser have to communicate bidirectionally |
| 2013-03-08 09:22:14 utc | whitequark | because {, for example, can mean three things in three different contexts, which a LALR parser cannot disambiguate (I think?), and thus it instead communicates its intent to the lexer |
| 2013-03-08 09:22:21 utc | whitequark | which emits one of the tokens |
| 2013-03-08 09:22:28 utc | whitequark | tLBRACE, tLBRACE2 and smth else |
| 2013-03-08 09:22:36 utc | whitequark | tLCURLY? probably. |
| 2013-03-08 09:23:20 utc | whitequark | oh, and also lexer has three of its own stack states, and I'm not even going to explain how they work, because I've no idea, and everywhere I've seen LALR ruby parsers they were just cargo cult copied |
| 2013-03-08 09:23:58 utc | jmettraux | quick parenthese/question (please go back to your initial flow after the answer): your target (post __END__) is to have SQL? |
| 2013-03-08 09:25:01 utc | whitequark | jmettraux: no, the data storage. see, I could design a way to transactionally add records of pre-known size, then a way to atomically replace older records, then make a directory of a set of records and put it in a record... eventually I'll get to schema, data and indexes |
| 2013-03-08 09:26:09 utc | whitequark | ok. so. back to ruby |
| 2013-03-08 09:28:08 utc | whitequark | my toy SQL parser was designed like this: it had a way to describe its grammar, kinda like neg does it, and it precompiled the rules into 'lookahead tables' |
| 2013-03-08 09:28:47 utc | whitequark | it was basically an attempt at implementing a LALR(n) parser generator, kinda like bison, but with one very important difference: unlike bison, which used fixed tokens, emitted by lexer, and having some data attached to them |
| 2013-03-08 09:29:21 utc | whitequark | I've had regexps in place of tokens, which could capture groups when they matched, and these groups were passed to the code handling the rule (or the result was just stuffed in AST) |
| 2013-03-08 09:31:32 utc | whitequark | so it was kind of like the current situation with ruby parser/ruby lexer, but without all the complexity of separate parser/lexer, without slowness of ragel (and its quirks), and without slowness of PEG, as this was just a LALR parser with linear time complexity by input length and linear space complexity by input parse tree depth |
| 2013-03-08 09:33:28 utc | whitequark | ok, enough talk, time to show the code |
| 2013-03-08 09:36:51 utc | whitequark | jmettraux: https://gist.github.com/whitequark/4bca914a95672593de22 |
| 2013-03-08 09:37:05 utc | whitequark | it doesn't really work unfortunately :/ It used to be, but I've lost that snapshot |
| 2013-03-08 09:37:09 utc | whitequark | *used to |
| 2013-03-08 09:42:03 utc | jmettraux | sweet anyway |
| 2013-03-08 09:43:51 utc | jmettraux | whitequark: off-topic question, I see you're in the Evil Martians team, do you guys happen to have someone working from Bali? I met him in a Ruby conf in Singapore last year, maybe my memory is bad... |
| 2013-03-08 09:45:01 utc | whitequark | jmettraux: all over the world |
| 2013-03-08 09:46:01 utc | jmettraux | ah, I think he's the guy: https://github.com/gazay |
| 2013-03-08 09:46:14 utc | whitequark | jmettraux: very likely |
| 2013-03-08 09:46:58 utc | jmettraux | do you plan to give your PicoDB prototype another round of trying? |
| 2013-03-08 09:48:42 utc | whitequark | jmettraux: that is quite likely |
| 2013-03-08 09:48:57 utc | whitequark | unfortunately I've no idea how to fix the parser |
| 2013-03-08 09:49:17 utc | whitequark | it would be really neat if someone helped me with it |
| 2013-03-08 09:50:12 utc | jmettraux | sorry, neg is too toyesque |
| 2013-03-08 09:51:01 utc | jmettraux | or like, sorry that neg is too toyesque |
| 2013-03-08 09:55:00 utc | whitequark | that's a pity |
| 2013-03-08 09:55:06 utc | whitequark | maybe I should just study bison better... |
| 2013-03-08 10:00:40 utc | jmettraux | especially if you want the perf |
| 2013-03-08 10:02:21 utc | whitequark | jmettraux: no, I'm not going to *use* it |
| 2013-03-08 10:02:42 utc | whitequark | or, well, I already do--the current ruby_parser is written with racc--but I'm not happy about that |
| 2013-03-08 10:02:51 utc | whitequark | it is very poorly suited for parsing ruby |
| 2013-03-08 10:06:37 utc | jmettraux | internal dsl instead of sql? sql-like internal dsl? |
| 2013-03-08 10:08:01 utc | whitequark | jmettraux: hm? |
| 2013-03-08 10:08:11 utc | whitequark | picodb is an SQL database; it accepts SQL |
| 2013-03-08 10:08:39 utc | whitequark | the parser generator of picodb is suited for parsing ruby better than bison, thus, if it would work, I'd use it also for parsing ruby |
| 2013-03-08 10:09:57 utc | jmettraux | why do you need to parse ruby? |
| 2013-03-08 10:10:09 utc | whitequark | jmettraux: http://whitequark.org/blog/2012/12/06/a-language-for-embedded-developers/ |
| 2013-03-08 10:10:46 utc | whitequark | ruby_parser is, well, crap. it doesn't report column numbers, line numbers are, quote, slightly off, unquote, and it dies on some pathological (for it) inputs |
| 2013-03-08 10:21:18 utc | jmettraux | thanks for explaining it all |
| 2013-03-08 10:26:03 utc | jmettraux | any news from that Jacob guy working on a Ruby AST builder on top of Ripper, the one who commented in http://whitequark.org/blog/2012/10/02/parsing-ruby/ ? |
| 2013-03-08 10:26:57 utc | whitequark | jmettraux: no |
| 2013-03-08 10:27:05 utc | whitequark | why would you want to use ripper? |
| 2013-03-08 10:27:18 utc | jmettraux | 1.9 |
| 2013-03-08 10:27:23 utc | whitequark | so? |
| 2013-03-08 10:27:34 utc | jmettraux | comes with the beast |
| 2013-03-08 10:27:38 utc | whitequark | so? |
| 2013-03-08 10:27:57 utc | whitequark | ripper does not detect errors |
| 2013-03-08 10:28:03 utc | jmettraux | it's more tested than ruby_parser |
| 2013-03-08 10:28:04 utc | whitequark | and is unportable |
| 2013-03-08 10:28:14 utc | whitequark | it _doesn't work_. |
| 2013-03-08 10:28:22 utc | whitequark | because a parser which cannot report an error is not a parser, it's bullshit |
| 2013-03-08 10:28:30 utc | jmettraux | my hope is that, if it's 1.9, other rubies will provide it somehow |
| 2013-03-08 10:28:35 utc | whitequark | no |
| 2013-03-08 10:28:50 utc | whitequark | this is not going to happen. ripper is internal undocumented API which depends on implementation details. |
| 2013-03-08 10:28:58 utc | jmettraux | great |
| 2013-03-08 10:29:38 utc | whitequark | use RP. I'm working first and foremost on improving RP. |
| 2013-03-08 10:30:32 utc | jmettraux | I'm using it here and there, I have to find time to move those bits and pieces to the latest and greatest RP |
| 2013-03-08 10:31:13 utc | jmettraux | this roughness in the ride keeps my eyes open to alternatives |
| 2013-03-08 10:31:34 utc | whitequark | well, ripper is far worse than RP anyway |
| 2013-03-08 10:31:49 utc | whitequark | you could check out JRubyParser, as headius suggests |
| 2013-03-08 10:32:58 utc | jmettraux | so you're joining the RP dev team? |
| 2013-03-08 10:33:11 utc | whitequark | jmettraux: hell no. zenspider is a dick |
| 2013-03-08 10:33:20 utc | whitequark | if he merges my changes, good |
| 2013-03-08 10:33:25 utc | whitequark | if he doesn't, there will be a fork |
| 2013-03-08 10:34:07 utc | whitequark | well, there will be a foundry-internal fork anyway, but that is only tangentially related |
| 2013-03-08 10:34:13 utc | whitequark | (foundry = that language I'm developing) |
| 2013-03-08 10:34:28 utc | whitequark | (it has to add type annotations, etc) |
| 2013-03-08 10:34:39 utc | jmettraux | gloomy ruby parsing landscape |
| 2013-03-08 10:35:05 utc | whitequark | parsing is, well, ghetto |
| 2013-03-08 10:35:12 utc | whitequark | at least you don't have to deal with C++ |
| 2013-03-08 10:35:30 utc | whitequark | a C++ compiler spends something like 70% of its time parsing the source |
| 2013-03-08 10:35:36 utc | whitequark | *runtime |
| 2013-03-08 10:45:31 utc | jmettraux | I hope you'll find the time to release and maintain your fork of ruby_parser |
| 2013-03-08 10:48:57 utc | whitequark | jmettraux: that is likely, considering I'll finish it at all (I probably will) |
| 2013-03-08 11:45:13 utc | jmettraux | whitequark: so, all being said, I wish you a good evening! |
| 2013-03-08 11:46:54 utc | whitequark | jmettraux: thanks! |